Modified cascade ribonucleoproteins and uses thereof

ABSTRACT

A clustered regularly interspaced short palindromic repeat (CRISPR)-associated complex for adaptive antiviral defence (Cascade); the Cascade protein complex comprising at least CRISPR-associated protein subunits Cas7, Cas5 and Cas6 which includes at least one subunit with an additional amino acid sequence possessing nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity. The Cascade complex with additional activity is combined with an RNA molecule to produce a ribonucleoprotein complex. The RNA molecule is selected to have substantial complementarity to a target sequence. Targeted ribonucleoproteins can be used as genetic engineering tools for precise cutting of nucleic acids in homologous recombination, non-homologous end joining, gene modification, gene integration, mutation repair or for their visualisation, transcriptional activation or repression. A pair of ribonucleotides fused to FokI dimers may be used to generate double-strand breakages in the DNA to facilitate these applications in a sequence-specific manner.

This application is a continuation of U.S. patent application Ser. No.14/326,099, filed 8 Jul. 2014, now abandoned, which is a continuation ofU.S. patent application Ser. No. 14/240,735, filed 24 Feb. 2014, nowpending, which is a National Stage Entry of PCT/EP2012/076674, filed 21Dec. 2012, now expired, which claims the benefit of priority under 35U.S.C. 119(a)/(b) of United Kingdom Patent Application No. GB 1122458.1,filed 30 Dec. 2011, now expired, the contents of all of which are hereinincorporated by reference in their entireties.

The present application contains a Sequence Listing that has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. The ASCII copy, created on 15 Jan. 2016, isnamed CBI010-13 ST25.txt and is 119 kb in size.

The invention relates to the field of genetic engineering and moreparticularly to the area of gene and/or genome modification oforganisms, including prokaryotes and eukaryotes. The invention alsoconcerns methods of making site specific tools for use in methods ofgenome analysis and genetic modification, whether in vivo or in vitro.The invention more particularly relates to the field ofribonucleoproteins which recognise and associate with nucleic acidsequences in a sequence specific way.

Bacteria and archaea have a wide variety of defense mechanisms againstinvasive DNA. So called CRISPR/Cas defense systems provide adaptiveimmunity by integrating plasmid and viral DNA fragments in loci ofclustered regularly interspaced short palindromic repeats (CRISPR) onthe host chromosome. The viral or plasmid-derived sequences, known asspacers, are separated from each other by repeating host-derivedsequences. These repetitive elements are the genetic memory of thisimmune system and each CRISPR locus contains a diverse repertoire ofunique ‘spacer’ sequences acquired during previous encounters withforeign genetic elements.

Acquisition of foreign DNA is the first step of immunization, butprotection requires that the CRISPR is transcribed and that these longtranscripts are processed into short CRISPR-derived RNAs (crRNAs) thateach contains a unique spacer sequence complementary to a foreignnucleic acid challenger.

In addition to the crRNA, genetic experiments in several organisms haverevealed that a unique set of CRISPR-associated (Cas) proteins isrequired for the steps of acquiring immunity, for crRNA biogenesis andfor targeted interference. Also, a subset of Cas proteins fromphylogenetically distinct CRISPR systems have been shown to assembleinto large complexes that include a crRNA.

A recent re-evaluation of the diversity of CRISPR/Cas systems hasresulted in a classification into three distinct types (Makarova K. etal (2011) Nature Reviews Microbiology—AOP 9 May 2011;doi:10.1038/nrmicro2577) that vary in cas gene content, and displaymajor differences throughout the CRISPR defense pathway. (The Makarovaclassification and nomenclature for CRISPR-associated genes is adoptedin the present specification.) RNA transcripts of CRISPR loci(pre-crRNA) are cleaved specifically in the repeat sequences by CRISPRassociated (Cas) endoribonucleases in type I and type III systems or byRNase III in type II systems; the generated crRNAs are utilized by a Casprotein complex as a guide RNA to detect complementary sequences ofeither invading DNA or RNA. Cleavage of target nucleic acids has beendemonstrated in vitro for the Pyrococcus furiosus type III-B system,which cleaves RNA in a ruler-anchored mechanism, and, more recently, invivo for the Streptococcus thermophiles type II system, which cleavesDNA in the complementary target sequence (protospacer). In contrast, fortype I systems the mechanism of CRISPR-interference is still largelyunknown.

The model organism Escherichia coli strain K12 possesses a CRISPR/Castype I-E (previously known as CRISPR subtype E (Cse)). It contains eightcas genes (cas1, cas2, cas3 and cse1, cse2, cas7, cas5, cas6e) and adownstream CRISPR (type-2 repeats). In Escherichia coli K12 the eightcas genes are encoded upstream of the CRISPR locus. Cas1 and Cas2 do notappear to be needed for target interference, but are likely toparticipate in new target sequence acquisition. In contrast, six Casproteins: Cse1, Cse2, Cas3, Cas7, Cas5 and Cas6e (previously also knownas CasA, CasB, Cas3, CasC/Cse4, CasD and CasE/Cse3 respectively) areessential for protection against lambda phage challenge. Five of theseproteins: Cse1, Cse2, Cas7, Cas5 and Cas6e (previously known as CasA,CasB, CasC/Cse4, CasD and CasE/Cse3 respectively) assemble with a crRNAto form a multi-subunit ribonucleoprotein (RNP) referred to as Cascade.

In E. coli, Cascade is a 405 kDa ribonucleoprotein complex composed ofan unequal stoichiometry of five functionally essential Cas proteins:Cse1₁Cse2₂Cas7₆Cas5₁Cas6e₁ (i.e. under previous nomenclatureCasA₁B₂C₆D₁E₁) and a 61-nt CRISPR-derived RNA. Cascade is an obligateRNP that relies on the crRNA for complex assembly and stability, and forthe identification of invading nucleic acid sequences. Cascade is asurveillance complex that finds and binds foreign nucleic acids that arecomplementary to the spacer sequence of the crRNA.

Jore et al. (2011) entitled “Structural basis for CRISPR RNA-guided DNArecognition by Cascade” Nature Structural & Molecular Biology 18:529-537 describes how there is a cleavage of the pre-crRNA transcript bythe Cas6e subunit of Cascade, resulting in the mature 61 nt crRNA beingretained by the CRISPR complex. The crRNA serves as a guide RNA forsequence specific binding of Cascade to double stranded (ds) DNAmolecules through base pairing between the crRNA spacer and thecomplementary protospacer, forming a so-called R-loop. This is known tobe an ATP-independent process.

Brouns S. J. J., et al (2008) entitled “Small CRISPR RNAs guideantiviral defense in prokaryotes” Science 321: 960-964 teaches thatCascade loaded with a crRNA requires Cas3 for in vivo phage resistance.

Marraffini L. & Sontheimer E. (2010) entitled “CRISPR interference:RNA-directed adaptive immunity in bacteria and archaea” Nature ReviewsGenetics 11: 181-190 is a review article which summarises the state ofknowledge in the art in the field. Some suggestions are made aboutCRISPR-based applications and technologies, but this is mainly in thearea of generating phage resistant strains of domesticated bacteria forthe dairy industry. The specific cleavage of RNA molecules in vitro by acrRNP complex in Pyrococcus furiosus is suggested as something whichawaits further development. Manipulation of CRISPR systems is alsosuggested as a possible way of reducing transmission ofantibiotic-resistant bacterial strains in hospitals. The authors stressthat further research effort will be needed to explore the potentialutility of the technology in these areas.

US2011236530 A1 (Manoury et al.) entitled “Genetic cluster of strains ofStreptococcus thermophilus having unique rheological properties fordairy fermentation” discloses certain S. thermophilus strains whichferment milk so that it is highly viscous and weakly ropy. A specificCRISPR locus of defined sequence is disclosed.

US2011217739 A1 (Terns et al.) entitled “Cas6 polypeptides and methodsof use” discloses polypeptides which have Cas6 endoribonucleaseactivity. The polypeptides cleave a target RNA polynucleotide having aCas6 recognition domain and cleavage site. Cleavage may be carried outin vitro or in vivo. Microbes such as E. coli or Haloferax volcanii aregenetically modified so as to express Cas6 endoribonuclease activity.

WO2010054154 (Danisco) entitled “Bifidobacteria CRISPR sequences”discloses various CRISPR sequences found in Bifidobacteria and their usein making genetically altered strains of the bacteria which are alteredin their phage resistance characteristics.

US2011189776 A1 (Terns et al.) entitled “Prokaryotic RNAi-like systemand methods of use” describes methods of inactivating targetpolynucleotides in vitro or in prokaryotic microbes in vivo. The methodsuse a psiRNA having a 5′ region of 5-10 nucleotides chosen from a repeatfrom a CRISPR locus immediately upstream of a spacer. The 3′ region issubstantially complementary to a portion of the target polynucleotide.Also described are polypeptides having endonuclease activity in thepresence of psiRNA and target polynucleotide.

EP2341149 A1 (Danisco) entitled “Use of CRISPR associated genes (CAS)describes how one or more Cas genes can be used for modulatingresistance of bacterial cells against bacteriophage; particularlybacteria which provide a starter culture or probiotic culture in dairyproducts.

WO2010075424 (The Regents of the University of California) entitled“Compositions and methods for downregulating prokaryotic genes”discloses an isolated polynucleotide comprising a CRISPR array. At leastone spacer of the CRISPR is complementary to a gene of a prokaryote sothat is can down-regulate expression of the gene; particularly where thegene is associated with biofuel production.

WO2008108989 (Danisco) entitled “Cultures with improved phageresistance” discloses selecting bacteriophage resistant strains ofbacteria and also selecting the strains which have an additional spacerhaving 100% identity with a region of phage RNA. Improved straincombinations and starter culture rotations are described for use in thedairy industry. Certain phages are described for use as biocontrolagents.

WO2009115861 (Institut Pasteur) entitled “Molecular typing and subtypingof Salmonella by identification of the variable nucleotide sequences ofthe CRISPR loci” discloses methods for detecting and identifyingbacterial of the Salmonella genus by using their variable nucleotidesequences contained in CRISPR loci.

WO2006073445 (Danisco) entitled “Detection and typing of bacterialstrains” describes detecting and typing of bacterial strains in foodproducts, dietary supplements and environmental samples. Strains ofLactobacillus are identified through specific CRISPR nucleotidesequences.

Urnov F et al. (2010) entitled “Genome editing with engineered zincfinger nucleases” Nature 11: 636-646 is a review article about zincfinger nucleases and how they have been instrumental in the field ofreverse genetics in a range of model organisms. Zinc finger nucleaseshave been developed so that precisely targeting genome cleavage ispossible followed by gene modification in the subsequent repair process.However, zinc finger nucleases are generated by fusing a number of zincfinger DNA-binding domains to a DNA cleavage domain. DNA sequencespecificity is achieved by coupling several zinc fingers in series, eachrecognising a three nucleotide motif. A significant drawback with thetechnology is that new zinc fingers need to be developed for each newDNA locus which requires to be cleaved. This requires proteinengineering and extensive screening to ensure specificity of DNAbinding.

In the fields of genetic engineering and genomic research there is anongoing need for improved agents for sequence/site specific nucleic aciddetection and/or cleavage.

The inventors have made a surprising discovery in that certain bacteriaexpressing Cas3, which has helicase-nuclease activity, express Cas3 as afusion with Cse1. The inventors have also unexpectedly been able toproduce artificial fusions of Cse1 with other nuclease enzymes.

The inventors have also discovered that Cas3-independent target DNArecognition by Cascade marks DNA for cleavage by Cas3, and that CascadeDNA binding is governed by topological requirements of the target DNA.

The inventors have further found that Cascade is unable to bind relaxedtarget plasmids, but surprisingly Cascade displays high affinity fortargets which have a negatively supercoiled (nSC) topology.

Accordingly in a first aspect the present invention provides a clusteredregularly interspaced short palindromic repeat (CRISPR)-associatedcomplex for antiviral defence (Cascade), the Cascade protein complex, orportion thereof, comprising at least CRISPR-associated protein subunits:

-   -   Cas7 (or COG 1857) having an amino acid sequence of SEQ ID NO:3        or a sequence of at least 18% identity therewith,    -   Cas5 (or COG1688) having an amino acid sequence of SEQ ID NO:4        or a sequence of at least 17% identity therewith, and    -   Cas6 (or COG 1583) having an amino acid sequence of SEQ ID NO:5        or a sequence of at least 16% identity therewith;        and wherein at least one of the subunits includes an additional        amino acid sequence providing nucleic acid or chromatin        modifying, visualising, transcription activating or        transcription repressing activity.

A subunit which includes an additional amino acid sequence havingnucleic acid or chromatin modifying, visualising, transcriptionactivating or transcription repressing activity is an example of whatmay be termed “a subunit linked to at least one functional moiety”; afunctional moiety being the polypeptide or protein made up of theadditional amino acid sequence. The transcription activating activitymay be that leading to activation or upregulation of a desired genes;the transcription repressing activity leading to repressing ordownregulation of a desired genes. The selection of the gene being dueto the targeting of the cascade complex of the invention with an RNAmolecule, as described further below.

The additional amino acid sequence having nucleic acid or chromatinmodifying, visualising, transcription activating or transcriptionrepressing activity is preferably formed of contiguous amino acidresidues. These additional amino acids may be viewed as a polypeptide orprotein which is contiguous and forms part of the Cas or Cse subunit(s)concerned. Such a polypeptide or protein sequence is preferably notnormally part of any Cas or Cse subunit amino acid sequence. In otherwords, the additional amino acid sequence having nucleic acid orchromatin modifying, visualising, transcription activating ortranscription repressing activity may be other than a Cas or Cse subunitamino acid sequence, or portion thereof, i.e. may be other than a Cas3submit amino acid sequence or portion thereof.

The additional amino acid sequence with nucleic acid or chromatinmodifying, visualising, transcription activating or transcriptionrepressing activity may, as desired, be obtained or derived from thesame organism, e.g. E. coli, as the Cas or Cse subunit(s).

Additionally and/or alternatively to the above, the additional aminoacid sequence having nucleic acid or chromatin modifying, visualising,transcription activating or transcription repressing activity may be“heterologous” to the amino acid sequence of the Cas or Cse subunit(s).Therefore, the additional amino acid sequence may be obtained or derivedfrom an organism different from the organism from which the Cas and/orCse subunit(s) are derived or originate.

Throughout, sequence identity may be determined by way of BLAST andsubsequent Cobalt multiple sequence alignment at the National Center forBiotechnology Information webserver, where the sequence in question iscompared to a reference sequence (e.g. SEQ ID NO: 3, 4 or 5). The aminoacid sequences may be defined in terms of percentage sequence similaritybased on a BLOSUM62 matrix or percentage identity with a given referencesequence (e.g. SEQ ID NO:3, 4 or 5). The similarity or identity of asequence involves an initial step of making the best alignment beforecalculating the percentage conservation with the reference and reflectsa measure of evolutionary relationship of sequences.

Cas7 may have a sequence similarity of at least 31% with SEQ ID NO:3;Cas5 may have a sequence similarity of at least 26% with SEQ ID NO:4.Cas6 may have a sequence similarity of at least 27% with SEQ ID NO:5.

For Cse1/CasA(502 AA): >gi|16130667|ref|NP_417240.1| CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 sub str. MG1655] [SEQ ID NO: 1]MNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHL RELKPQGGPSNGFor Cse2/CasB (160 AA): >gi|16130666|ref|NP_417239.1| CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 2]MADEIDAMALYRAWQQLDNGSCAQIRRVSEPDELRDIPAFYRLVQPFGWENPRHQQALLRMVFCLSAGKNVIRHQDKKSEQTTGISLGRALANSGRINERRIFQLIRADRTADMVQLRRLLTHAEPVLDWPLMARMLTWWGKRERQQ LLEDFVLTTNKNAFor Cas7/CasC/Cse4 (363 AA): >gi|16130665|ref|NP_417238.11 CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 3]MSNFINIHVLISHSPSCLNRDDMNMQKDAIFGGKRRVRISSQSLKRAMRKSGYYAQNIGESSLRTIHLAQLRDVLRQKLGERFDQKIIDKTLALLSGKSVDEAEKISADAVTPWVVGEIAWFCEQVAKAEADNLDDKKLLKVLKEDIAAIRVNLQQGVDIALSGRMATSGMMTELGKVDGAMSIAHAITTHQVDSDIDWFTAVDDLQEQGSAHLGTQEFSSGVFYRYANINLAQLQENLGGASREQALEIATHVVHMLATEVPGAKQRTYAAFNPADMVMVNFSDMPLSMANAFEKAVKAKDGFLQPSIQAFNQYWDRVANGYGLNGAAAQFSLSDVDPITAQ VKQMPTLEQLKSWVRNNGEAFor Cas5/CasD(224 AA): >gi|90111483|ref|NP_417237.2| CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 4]MRSYLILRLAGPMQAWGQPTFEGTRPTGRFPTRSGLLGLLGACLGIQRDDTSSLQALSESVQFAVRCDELILDDRRVSVTGLRDYHTVLGAREDYRGLKSHETIQTWREYLCDASFTVALWLTPHATMVISELEKAVLKPRYTPYLGRRSCPLTHPLFLGTCQASDPQKALLNYEPVGGDIYSEESVTGHHLKFTARDEPMITLPRQFASREWYVIKGGMDVSQFor Cas6e/CasE(199 AA): >gi|16130663|ref|NP_417236.1| CRISPR RNAprecursor cleavage enzyme; CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 5]MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEGCHVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILDNQKRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQYFSGDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSL APL

In defining the range of sequence variants which fall within the scopeof the invention, for the avoidance of doubt, the following are eachoptional limits on the extent of variation, to be applied for each ofSEQ ID NO:1, 2, 3, 4 or 5 starting from the respect broadest range ofvariants as specified in terms of the respective percentage identityabove. The range of variants therefore may therefore include: at least16%, or at least 17%, or at least 18%, or at least 19%, or at least 20%,or at least 21%, or at least 22%, or at least 23%, or at least 24%, orat least 25%, or at least 26%, or at least 27%, or at least 28%, or atleast 29%, or at least 30%, or at least 31%, or at least 32%, or atleast 33%, or at least 34%, or at least 35%, or at least 36%, or atleast 37%, or at least 38%, or at least 39%, or at least 40%, or atleast 41%, or at least 42%, or at least 43%, at least 44%, or at least45%, or at least 46%, or at least 47%, or at least 48%, or at least 49%,or at least 50%, or at least 51%, or at least 52%, or at least 53%, orat least 54%, or at least 55%, or at least 56%, or at least 57%, or atleast 58%, or at least 59%, or at least 60%, or at least 61%, or atleast 62%, or at least 63%, or at least 64%, or at least 65%, or atleast 66%, or at least 67%, or at least 68%, or at least 69%, or atleast 70%, or at least 71%, at least 72%, or at least 73%, or at least74%, or at least 75%, or at least 76%, or at least 77%, or at least 78%,or at least 79%, or at least 80%, or at least 81%, or at least 82%, orat least 83%, or at least 84%, or at least 85%, or at least 86%, or atleast 87%, or at least 88%, or at least 89%, or at least 90%, or atleast 91%, or at least 92%, or at least 93%, or at least 94%, or atleast 95%, or at least 96%, or at least 97%, or at least 98%, or atleast 99%, or 100% amino acid sequence identity.

Throughout, the Makarova et al. (2011) nomenclature is being used in thedefinition of the Cas protein subunits. Table 2 on page 5 of theMakarova et al. article lists the Cas genes and the names of thefamilies and superfamilies to which they belong. Throughout, referenceto a Cas protein or Cse protein subunit includes cross reference to thefamily or superfamily of which these subunits form part.

Throughout, the reference sequences of the Cas and Cse subunits of theinvention may be defined as a nucleotide sequence encoding the aminoacid sequence. For example, the amino acid sequence of SEQ ID NO:3 forCas7 also includes all nucleic acid sequences which encode that aminoacid sequence. The variants of Cas7 included within the scope of theinvention therefore include nucleotide sequences of at least the definedamino acid percentage identities or similarities with the referencenucleic acid sequence; as well as all possible percentage identities orsimilarities between that lower limit and 100%.

The Cascade complexes of the invention may be made up of subunitsderived or modified from more than one different bacterial or archaealprokaryote. Also, the subunits from different Cas subtypes may be mixed.

In a preferred aspect, the Cas6 subunit is a Cas6e subunit of SEQ ID NO:17 below, or a sequence of at least 16% identity therewith.

The sequence of a preferred Cas6e subunit is >gi|16130663|ref|NP_417236.1|CRISPR RNA precursor cleavage enzyme; CRISP RNA (crRNA)containing Cascade antiviral complex protein[Escherichia coli str. K-12 substr. MG1655]: [SEQ ID NO: 17]MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEGCHVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILDNQKRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQYFSGDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSL APL

The Cascade complexes, or portions thereof, of the invention—whichcomprise at least one subunit which includes an additional amino acidsequence having nucleic acid or chromatin modifying, visualising,transcription activating or transcription repressing activity—mayfurther comprise a Cse2 (or YgcK-like) subunit having an amino acidsequence of SEQ ID NO:2 or a sequence of at least 20% identitytherewith, or a portion thereof. Alternatively, the Cse subunit isdefined as having at least 38% similarity with SEQ ID NO:2. Optionally,within the protein complex of the invention it is the Cse2 subunit whichincludes the additional amino acid sequence having nucleic acid orchromatin modifying activity.

Additionally or alternatively, the Cascade complexes of the inventionmay further comprise a Cse1 (or YgcL-like) subunit having an amino acidsequence of SEQ ID NO: 1 or a sequence of at least 9% identitytherewith, or a portion thereof. Optionally within the protein complexof the invention it is the Cse1 subunit which includes the additionalamino acid sequence having nucleic acid or chromatin modifying,visualising, transcription activating or transcription repressingactivity.

In preferred embodiments, a Cascade complex of the invention is a Type ICRISPR-Cas system protein complex; more preferably a subtype I-ECRISPR-Cas protein complex or it can be based on a Type I-A or Type I-Bcomplex. A Type I-C, D or F complex is possible. In particularlypreferred embodiments based on the E. coli system, the subunits may havethe following stoichiometries: Cse1₁Cse2₂Cas7₆Cas5₁ Cas6₁ orCse1₁Cse2₂Cas7₆Cas5₁Cas6e₁.

The additional amino acid sequence having nucleic acid or chromatinmodifying, visualising, transcription activating or transcriptionrepressing activity may be translationally fused through expression innatural or artificial protein expression systems, or covalently linkedby a chemical synthesis step to the at least one subunit; preferably theat least one functional moiety is fused or linked to at least the regionof the N terminus and/or the region of the C terminus of at least one ofa Cse1, Cse2, Cas7, Cas5, Cas6 or Cas6e subunit. In particularlypreferred embodiments, the additional amino acid sequence having nucleicacid or chromatin modifying activity is fused or linked to the Nterminus or the C terminus of a Cse1, a Cse2 or a Cas5 subunit; morepreferably the linkage is in the region of the N terminus of a Cse1subunit, the N terminus of a Cse2 subunit, or the N terminus of a Cas7subunit.

The additional amino acid sequence having nucleic acid or chromatinmodifying, activating, repressing or visualising activity may be aprotein; optionally selected from a helicase, a nuclease, anuclease-helicase, a DNA methyltransferase (e.g. Dam), or DNAdemethylase, a histone methyltransferase, a histone demethylase, anacetylase, a deacetylase, a phosphatase, a kinase, a transcription(co-)activator, an RNA polymerase submit, a transcription repressor, aDNA binding protein, a DNA structuring protein, a marker protein, areporter protein, a fluorescent protein, a ligand binding protein (e.g.mCherry or a heavy metal binding protein), a signal peptide (e.g.Tat-signal sequence), a subcellular localisation sequence (e.g. nuclearlocalisation sequence) or an antibody epitope.

The protein concerned may be a heterologous protein from a species otherthan the bacterial species from which the Cascade protein subunits havetheir sequence origin.

When the protein is a nuclease, it may be one selected from a type IIrestriction endonuclease such as FokI, or a mutant or an active portionthereof. Other type II restriction endonucleases which may be usedinclude EcoR1, EcoRV, BgII, BamHI, BsgI and BspMI. Preferably, oneprotein complex of the invention may be fused to the N terminal domainof FokI and another protein complex of the invention may be fused to theC terminal domain of FokI. These two protein complexes may then be usedtogether to achieve an advantageous locus specific double stranded cutin a nucleic acid, whereby the location of the cut in the geneticmaterial is at the design and choice of the user, as guided by the RNAcomponent (defined and described below) and due to presence of aso-called “protospacer adjacent motif” (PAM) sequence in the targetnucleic acid strand (also described in more detail below).

In a preferred embodiment, a protein complex of the invention has anadditional amino acid sequence which is a modified restrictionendonuclease, e.g. FokI. The modification is preferably in the catalyticdomain. In preferred embodiments, the modified FokI is KKR Sharkey orELD Sharkey which is fused to the Cse1 protein of the protein complex.In a preferred application of these complexes of the invention, two ofthese complexes (KKR Sharkey and ELD Sharkey) may be together incombination. A heterodimer pair of protein complexes employingdifferently modified FokI is has particular advantage in targeted doublestranded cutting of nucleic acid. If homodimers are used then it ispossible that there is more cleavage at non-target sites due tonon-specific activity. A heterodimer approach advantageously increasesthe fidelity of the cleavage in a sample of material.

The Cascade complex with additional amino acid sequence having nucleicacid or chromatin modifying, visualising, transcription activating ortranscription repressing activity defined and described above is acomponent part of an overall system of the invention whichadvantageously permits the user to select in a predetermined matter aprecise genetic locus which is desired to be cleaved, tagged orotherwise altered in some way, e.g methylation, using any of the nucleicacid or chromatin modifying, visualising, transcription activating ortranscription repressing entities defined herein. The other componentpart of the system is an RNA molecule which acts as a guide fordirecting the Cascade complex of the invention to the correct locus onDNA or RNA intending to be modified, cut or tagged.

The Cascade complex of the invention preferably also comprises an RNAmolecule which comprises a ribonucleotide sequence of at least 50%identity to a desired target nucleic acid sequence, and wherein theprotein complex and the RNA molecule form a ribonucleoprotein complex.Preferably the ribonucleoprotein complex forms when the RNA molecule ishybridized to its intended target nucleic acid sequence. Theribonucleoprotein complex forms when the necessary components ofCascade-functional moiety combination and RNA molecule and nucleic acid(DNA or RNA) are present together in suitable physiological conditions,whether in vivo or in vitro. Without wishing to be bound by anyparticular theory, the inventors believe that in the context of dsDNA,particularly negatively supercoiled DNA, the Cascade complex associatingwith the dsDNA causes a partial unwinding of the duplex strands whichthen allows the RNA to associate with one strand; the wholeribonucleoprotein complex then migrates along the DNA strand until atarget sequence substantially complementary to at least a portion of theRNA sequence is reached, at which point a stable interaction between RNAand DNA strand occurs, and the function of the functional moiety takeseffect, whether by modifying, nuclease cutting or tagging of the DNA atthat locus.

In preferred embodiments, a portion of the RNA molecule has at least 50%identity to the target nucleic acid sequence; more preferably at least95% identity to the target sequence. In more preferred embodiments, theportion of the RNA molecule is substantially complementary along itslength to the target DNA sequence; i.e. there is only one, two, three,four or five mismatches which may be contiguous or non-contiguous. TheRNA molecule (or portion thereof) may have at least 51%, or at least52%, or at least 53%, or at least 54%, or at least 55%, or at least 56%,or at least 57%, or at least 58%, or at least 59%, or at least 60%, orat least 61%, or at least 62%, or at least 63%, or at least 64%, orleast 65%, or at least 66%, or at least 67%, or at least 68%, or atleast 69%, or at least 70%, or at least 71%, or at least 72%, or atleast 73%, or at least 74%, or at least 75%, or at least 76%, or atleast 77%, or at least 78%, or at least 79%, or at least 80%, or atleast 81%, or at least 82%, or at least 83%, or at least 84%, or least85%, or at least 86%, or at least 87%, or at least 88%, or at least 89%,or at least 90%, or at least 91%, or at least 92%, or at least 93%, orat least 94%, or at least 95%, or at least 96%, or at least 97%, or atleast 98%, or at least 99%, or 100% identity to the target sequence.

The target nucleic acid may be DNA (ss or ds) or RNA.

In other preferred embodiments, the RNA molecule or portion thereof hasat least 70% identity with the target nucleic acid. At such levels ofidentity, the target nucleic acid is preferably dsDNA.

The RNA molecule will preferably require a high specificity and affinityfor the target nucleic acid sequence. A dissociation constant (K_(d)) inthe range 1 pM to 1 μM, preferably 1-100 nM is desirable as determinedby preferably native gel electrophoresis, or alternatively isothermaltitration calorimetry, surface plasmon resonance, or fluorescence basedtitration methods. Affinity may be determined using an electrophoreticmobility shift assay (EMSA), also called gel retardation assay (seeSemenova E et al. (2011) Proc. Natl. Acad. Sci. USA 108: 10098-10103).

The RNA molecule is preferably modelled on what are known from nature inprokaryotes as CRISPR RNA (crRNA) molecules. The structure of crRNAmolecules is already established and explained in more detail in Jore etal. (2011) Nature Structural & Molecular Biology 18: 529-537. In brief,a mature crRNA of type I-E is often 61 nucleotides long and consists ofa 5′ “handle” region of 8 nucleotides, the “spacer” sequence of 32nucleotides, and a 3′ sequence of 21 nucleotides which form a hairpinwith a tetranucleotide loop. However, the RNA used in the invention doesnot have to be designed strictly to the design of naturally occurringcrRNA, whether in length, regions or specific RNA sequences. What isclear though, is that RNA molecules for use in the invention may bedesigned based on gene sequence information in the public databases ornewly discovered, and then made artificially, e.g. by chemical synthesisin whole or in part. The RNA molecules of the invention may also bedesigned and produced by way of expression in genetically modified cellsor cell free expression systems and this option may include synthesis ofsome or all of the RNA sequence.

The structure and requirements of crRNA has also been described inSemenova E et al. (2011) Proc. Natl. Acad. Sci. USA 108: 10098-10103.There is a so-called “SEED” portion forming the 5′ end of the spacersequence and which is flanked 5′ thereto by the 5′ handle of 8nucleotides. Semenova et al. (2011) have found that all residues of theSEED sequence should be complementary to the target sequence, althoughfor the residue at position 6, a mismatch may be tolerated. Similarly,when designing and making an RNA component of a ribonucleoproteincomplex of the invention directed at a target locus (i.e. sequence), thenecessary match and mismatch rules for the SEED sequence can be applied.

The invention therefore includes a method of detecting and/or locating asingle base change in a target nucleic acid molecule comprisingcontacting a nucleic acid sample with a ribonucleoprotein complex of theinvention as hereinbefore described, or with a Cascade complex andseparate RNA component of the invention as hereinbefore described, andwherein the sequence of the RNA component (including when in theribonucleoprotein complex) is such that it discriminates between anormal allele and a mutant allele by virtue of a single base change atposition 6 of a contiguous sequence of 8 nucleotide residues.

In embodiments of the invention, the RNA molecule may have a length inthe range of 35-75 residues. In preferred embodiments, the portion ofthe RNA which is complementary to and used for targeting a desirednucleic acid sequence is 32 or 33 residues long. (In the context of anaturally occurring crRNA, this would correspond to the spacer portion;as shown in FIG. 1 of Semenova et al. (2011)).

A ribonucleoprotein complex of the invention may additionally have anRNA component comprising 8 residues 5′ to the RNA sequence which has atleast substantial complementarity to the nucleic acid target sequence.(The RNA sequence having at least substantial complementarity to thenucleic acid target sequence would be understood to correspond in thecontext of a crRNA as being the spacer sequence. The 5′ flankingsequence of the RNA would be considered to correspond to the 5′ handleof a crRNA. This is shown in FIG. 1 of Semenova et al. (2011)).

A ribonucleoprotein complex of the invention may have a hairpin andtetranucleotide loop forming sequence 3′ to the RNA sequence which hasat least substantial complementarity to the DNA target sequence. (In thecontext of crRNA, this would correspond to a 3′ handle flanking thespacer sequence as shown in FIG. 1 of Semenova et al. (2011)).

In some embodiments, the RNA may be a CRISPR RNA (crRNA).

The Cascade proteins and complexes of the invention may be characterisedin vitro in terms of its activity of association with the RNA guidingcomponent to form a ribonucleoprotein complex in the presence of thetarget nucleic acid (which may be DNA or RNA). An electrophoreticmobility shift assay (EMSA) may be used as a functional assay forinteraction of complexes of the invention with their nucleic acidtargets. Basically, Cascade-functional moiety complex of the inventionis mixed with nucleic acid targets and the stable interaction of theCascade-functional moiety complex is monitored by EMSA or by specificreadout out the functional moiety, for example endonucleolytic cleavageof target DNA at the desired site. This can be determined by furtherrestriction fragment length analysis using commercially availableenzymes with known specificities and cleavage sites in a target DNAmolecule.

Visualisation of binding of Cascade proteins or complexes of theinvention to DNA or RNA in the presence of guiding RNA may be achievedusing scanning/atomic force microscopy (SFM/AFM) imaging and this mayprovide an assay for the presence of functional complexes of theinvention.

The invention also provides a nucleic acid molecule encoding at leastone clustered regularly interspaced short palindromic repeat(CRISPR)-associated protein subunit selected from:

-   -   a. a Cse1 subunit having an amino acid sequence of SEQ ID NO: 1        or a sequence of at least 9% identity therewith;    -   b. a Cse2 subunit having an amino acid sequence of SEQ ID NO:2        or a sequence of at least 20% identity therewith;    -   c. a Cas7 subunit having an amino acid sequence of SEQ ID NO:3        or a sequence of at least 18% identity therewith;    -   d. a Cas5 subunit having an amino acid sequence of SEQ ID NO:4        or a sequence of at least 17% identity therewith;    -   e. a Cas6 subunit having an amino acid sequence of SEQ ID NO:5        or a sequence of at least 16% identity therewith; and        wherein at least a, b, c, d or e includes an additional amino        acid sequence having nucleic acid or chromatin modifying,        visualising, transcription activating or transcription        repressing activity.

The additional amino acid sequence having nucleic acid or chromatinmodifying, visualising, transcription activating or transcriptionrepressing activity is preferably fused to the CRISPR-associated proteinsubunit.

In the nucleic acids of the invention defined above, the nucleotidesequence may be that which encodes the respective SEQ ID NO: 1, SEQ IDNO:2, SEQ ID NO:3, SEQ ID NO:4 or SEQ ID NO:5, or in defining the rangeof variant sequences thereto, it may be a sequence hybridisable to thatnucleotide sequence, preferably under stringent conditions, morepreferably very high stringency conditions. A variety of stringenthybridisation conditions will be familiar to the skilled reader in thefield. Hybridization of a nucleic acid molecule occurs when twocomplementary nucleic acid molecules undergo an amount of hydrogenbonding to each other known as Watson-Crick base pairing. The stringencyof hybridization can vary according to the environmental (i.e.chemical/physical/biological) conditions surrounding the nucleic acids,temperature, the nature of the hybridization method, and the compositionand length of the nucleic acid molecules used. Calculations regardinghybridization conditions required for attaining particular degrees ofstringency are discussed in Sambrook et al., Molecular Cloning: ALaboratory Manual (Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N. Y., 2001); and Tijssen, Laboratory Techniques in Biochemistryand Molecular Biology—Hybridization with Nucleic Acid Probes Part I,Chapter 2 (Elsevier, New York, 1993). The T_(m) is the temperature atwhich 50% of a given strand of a nucleic acid molecule is hybridized toits complementary strand. The following is an exemplary set ofhybridization conditions and is not limiting:

Very High Stringency (Allows Sequences that Share at Least 90% Identityto Hybridize)Hybridization: 5×SSC at 65° C. for 16 hoursWash twice: 2×SSC at room temperature (RT) for 15 minutes eachWash twice: 0.5×SSC at 65° C. for 20 minutes eachHigh Stringency (Allows Sequences that Share at Least 80% Identity toHybridize)Hybridization: 5×-6×SSC at 65° C.-70° C. for 16-20 hoursWash twice: 2×SSC at RT for 5-20 minutes eachWash twice: 1×SSC at 55° C.-70° C. for 30 minutes eachLow Stringency (Allows Sequences that Share at Least 50% Identity toHybridize)Hybridization: 6×SSC at RT to 55° C. for 16-20 hoursWash at least twice: 2×-3×SSC at RT to 55° C. for 20-30 minutes each.

The nucleic acid molecule may be an isolated nucleic acid molecule andmay be an RNA or a DNA molecule.

The additional amino acid sequence may be selected from a helicase, anuclease, a nuclease-helicase (e.g. Cas3), a DNA methyltransferase (e.g.Dam), a DNA demethylase, a histone methyltransferase, a histonedemethylase, an acetylase, a deacetylase, a phosphatase, a kinase, atranscription (co-)activator, an RNA polymerase subunit, a transcriptionrepressor, a DNA binding protein, a DNA structuring protein, a markerprotein, a reporter protein, a fluorescent protein, a ligand bindingprotein (e.g. mCherry or a heavy metal binding protein), a signalpeptide (e.g. Tat-signal sequence), a subcellular localisation sequence(e.g. nuclear localisation sequence), or an antibody epitope. Theadditional amino acid sequence may be, or from a different protein fromthe organism from which the relevant Cascade protein subunit(s) arederived.

The invention includes an expression vector comprising a nucleic acidmolecule as hereinbefore defined. One expression vector may contain thenucleotide sequence encoding a single Cascade protein subunit and alsothe nucleotide sequence encoding the additional amino acid sequence,whereby on expression the subunit and additional sequence are fused.Other expression vectors may comprise nucleotide sequences encoding justone or more Cascade protein subunits which are not fused to anyadditional amino acid sequence.

The additional amino acid sequence with nucleic acid or chromatinmodifying activity may be fused to any of the Cascade subunits via alinker polypeptide. The linker may be of any length up to about 60 or upto about 100 amino acid residues. Preferably the linker has a number ofamino acids in the range 10 to 60, more preferably 10-20. The aminoacids are preferably polar and/or small and/or charged amino acids (e.g.Gln, Ser, Thr, Pro, Ala, Glu, Asp, Lys, Arg, His, Asn, Cys, Tyr). Thelinker peptide is preferably designed to obtain the correct spacing andpositioning of the fused functional moiety and the subunit of Cascade towhich the moiety is fused to allow proper interaction with the targetnucleotide.

An expression vector of the invention (with or without nucleotidesequence encoding amino acid residues which on expression will be fusedto a Cascade protein subunit) may further comprise a sequence encodingan RNA molecule as hereinbefore defined. Consequently, such expressionvectors can be used in an appropriate host to generate aribonucleoprotein of the invention which can target a desired nucleotidesequence.

Accordingly, the invention also provides a method of modifying,visualising, or activating or repressing transcription of a targetnucleic acid comprising contacting the nucleic acid with aribonucleoprotein complex as hereinbefore defined. The modifying may beby cleaving the nucleic acid or binding to it.

The invention also includes a method of modifying, visualising, oractivating or repressing transcription of a target nucleic acidcomprising contacting the nucleic acid with a Cascade protein complex ashereinbefore defined, plus an RNA molecule as hereinbefore defined.

In accordance with the above methods, the modification, visualising, oractivating or repressing transcription of a target nucleic acid maytherefore be carried out in vitro and in a cell free environment; i.e.the method is carried out as a biochemical reaction whether free insolution or whether involving a solid phase. Target nucleic acid may bebound to a solid phase, for example.

In a cell free environment, the order of adding each of the targetnucleic acid, the Cascade protein complex and the RNA molecule is at theoption of the average skilled person. The three components may be addedsimultaneously, sequentially in any desired order, or separately atdifferent times and in a desired order. Thus it is possible for thetarget nucleic acid and RNA to be added simultaneously to a reaction mixand then the Cascade protein complex of the invention to be addedseparately and later in a sequence of specific method steps.

The modification, visualising, or activating or repressing transcriptionof a target nucleic acid may be made in situ in a cell, whether anisolated cell or as part of a multicellular tissue, organ or organism.Therefore in the context of whole tissue and organs, and in the contextof an organism, the method can be carried out in vivo or it can becarried out by isolating a cell from the whole tissue, organ or organismand then returning the cell treated with ribonucleoprotein complex toits former location, or a different location, whether within the same ora different organism. Thus the method would include allografts,autografts, isografts and xenografts.

In these embodiments, the ribonucleoprotein complex or the Cascadeprotein complex of the invention requires an appropriate form ofdelivery into the cell, which will be well known to persons of skill inthe art, including microinjection, whether into the cell cytoplasm orinto the nucleus.

Also when present separately, the RNA molecule requires an appropriateform of delivery into a cell, whether simultaneously, separately orsequentially with the Cascade protein complex. Such forms of introducingRNA into cells are well known to a person of skill in the art and mayinclude in vitro or ex vivo delivery via conventional transfectionmethods. Physical methods, such as microinjection and electroporation,as well as calcium co-precipitation, and commercially available cationicpolymers and lipids, and cell-penetrating peptides, cell-penetratingparticles (gene-gun) may each be used. For example, viruses may be usedas delivery vehicles, whether to the cytoplasm and/or nucleus—e.g. viathe (reversible) fusion of Cascade protein complex of the invention or aribonucleoprotein complex of the invention to the viral particle. Viraldelivery (e.g. adenovirus delivery) or Agrobacterium-mediated deliverymay be used.

The invention also includes a method of modifying visualising, oractivating or repressing transcription of a target nucleic acid in acell, comprising transfecting, transforming or transducing the cell withany of the expression vectors as hereinbefore described. The methods oftransfection, transformation or transduction are of the types well knownto a person of skill in the art. Where there is one expression vectorused to generate expression of a Cascade complex of the invention andwhen the RNA is added directly to the cell then the same or a differentmethod of transfection, transformation or transduction may be used.Similarly, when there is one expression vector being used to generateexpression of a Cascade-functional fusion complex of the invention andwhen another expression vector is being used to generate the RNA in situvia expression, then the same or a different method of transfection,transformation or transduction may be used.

In other embodiments, mRNA encoding the Cascade complex of the inventionis introduced into a cell so that the Cascade complex is expressed inthe cell. The RNA which guides the Cascade complex to the desired targetsequence is also introduced into the cell, whether simultaneously,separately or sequentially from the mRNA, such that the necessaryribonucleoprotein complex is formed in the cell.

In the aforementioned methods of modifying or visualising a targetnucleic acid, the additional amino acid sequence may be a marker and themarker associates with the target nucleic acid; preferably wherein themarker is a protein; optionally a fluorescent protein, e.g. greenfluorescent protein (GFP) or yellow fluorescent protein (YFP) ormCherry. Whether in vitro, ex vivo or in vivo, then methods of theinvention can be used to directly visualise a target locus in a nucleicacid molecule, preferably in the form of a higher order structure suchas a supercoiled plasmid or chromosome, or a single stranded targetnucleic acid such as mRNA. Direct visualisation of a target locus mayuse electron micrography, or fluorescence microscopy.

Other kinds of label may be used to mark the target nucleic acidincluding organic dye molecules, radiolabels and spin labels which maybe small molecules.

In methods of the invention described above, the target nucleic acid isDNA; preferably dsDNA although the target can be RNA; preferably mRNA.

In methods of the invention for modifying, visualising, activatingtranscription or repressing transcription of a target nucleic acidwherein the target nucleic acid is dsDNA, the additional amino acidsequence with nucleic acid or chromatin modifying activity may be anuclease or a helicase-nuclease, and the modification is preferably asingle stranded or a double stranded break at a desired locus. In thisway unique sequence specific cutting of DNA can be engineered by usingthe Cascade-functional moiety complexes. The chosen sequence of the RNAcomponent of the final ribonucleoprotein complex provides the desiredsequence specificity for the action of the additional amino acidsequence.

Therefore, the invention also provides a method of non-homologous endjoining of a dsDNA molecule in a cell at a desired locus to remove atleast a part of a nucleotide sequence from the dsDNA molecule;optionally to knockout the function of a gene or genes, wherein themethod comprises making double stranded breaks using any of the methodsof modifying a target nucleic acid as hereinbefore described.

The invention further provides a method of homologous recombination of anucleic acid into a dsDNA molecule in a cell at a desired locus in orderto modify an existing nucleotide sequence or insert a desired nucleotidesequence, wherein the method comprises making a double or singlestranded break at the desired locus using any of the methods ofmodifying a target nucleic acid as hereinbefore described.

The invention therefore also provides a method of modifying, activatingor repressing gene expression in an organism comprising modifying,activating transcription or repressing transcription of a target nucleicacid sequence according to any of the methods hereinbefore described,wherein the nucleic acid is dsDNA and the functional moiety is selectedfrom a DNA modifying enzyme (e.g. a demethylase or deacetylase), atranscription activator or a transcription repressor.

The invention additionally provides a method of modifying, activating orrepressing gene expression in an organism comprising modifying,activating transcription or repressing transcription of a target nucleicacid sequence according to any of the methods hereinbefore described,wherein the nucleic acid is an mRNA and the functional moiety is aribonuclease; optionally selected from an endonuclease, a 3′ exonucleaseor a 5′ exonuclease.

In any of the methods of the invention as described above, the cellwhich is subjected to the method may be a prokaryote. Similarly, thecell may be a eukaryotic cell, e.g. a plant cell, an insect cell, ayeast cell, a fungal cell, a mammalian cell or a human cell. When thecell is of a mammal or human then it can be a stem cell (but may not beany human embryonic stem cell). Such stem cells for use in the inventionare preferably isolated stem cells. Optionally in accordance with anymethod the invention a cell is transfected in vitro.

Preferably though, in any of the methods of the invention, the targetnucleic acid has a specific tertiary structure, optionally supercoiled,more preferably wherein the target nucleic acid is negativelysupercoiled. Advantageously, the ribonucleoprotein complexes of theinvention, whether produced in vitro, or whether formed within cells, orwhether formed within cells via expression machinery of the cell, can beused to target a locus which would otherwise be difficult to get accessto in order to apply the functional activity of a desired component,whether labelling or tagging of a specific sequence, modification ofnucleic acid structure, switching on or off of gene expression, or ofmodification of the target sequence itself involving single or doublestranded cutting followed by insertion of one or more nucleotideresidues or a cassette.

The invention also includes a pharmaceutical composition comprising aCascade protein complex or a ribonucleoprotein complex of the inventionas hereinbefore described.

The invention further includes a pharmaceutical composition comprisingan isolated nucleic acid or an expression vector of the invention ashereinbefore described.

Also provided is a kit comprising a Casacade protein complex of theinvention as hereinbefore described plus an RNA molecule of theinvention as hereinbefore described.

The invention includes a Cascade protein complex or a ribonucleoproteincomplex or a nucleic acid or a vector, as hereinbefore described for useas a medicament.

The invention allows a variety of possibilities to physically alter DNAof prokaryotic or eukaryotic hosts at a specified genomic locus, orchange expression patterns of a gene at a given locus. Host genomic DNAcan be cleaved or modified by methylation, visualized by fluorescence,transcriptionally activated or repressed by functional domains such asnucleases, methylases, fluorescent proteins, transcription activators orrepressors respectively, fused to suitable Cascade-subunits. Moreover,the RNA-guided RNA-binding ability of Cascade permits the monitoring ofRNA trafficking in live cells using fluorescent Cascade fusion proteins,and provides ways to sequester or destroy host mRNAs causinginterference with gene expression levels of a host cell.

In any of the methods of the invention, the target nucleic acid may bedefined, preferably so if dsDNA, by the presence of at least one of thefollowing nucleotide triplets: 5′-CTT-3′, 5′-CAT-3′, 5′-CCT-3′, or5′-CTC-3′ (or 5′-CUU-3′, 5′-CAU-3′, 5′-CCU-3′, or 5′-CTC-3′ if thetarget is an RNA). The location of the triplet is in the target strandadjacent to the sequence to which the RNA molecule component of aribonucleoprotein of the invention hybridizes. The triplet marks thepoint in the target strand sequence at which base pairing with the RNAmolecule component of the ribonucleoprotein does not take place in a 5′to 3′ (downstream) direction of the target (whilst it takes placeupstream of the target sequence from that point subject to the preferredlength of the RNA sequence of the RNA molecule component of theribonucleoprotein of the invention). In the context of a native type ICRISPR system, the triplets correspond to what is known as a “PAM”(protospacer adjacent motif). For ssDNA or ssRNA targets, presence ofone of the triplets is not so necessary.

The invention will now be described in detail and with reference tospecific examples and drawings in which:

FIG. 1 shows the results of gel-shift assays where Cascade bindsnegatively supercoiled (nSC) plasmid DNA but not relaxed DNA. A)Gel-shift of nSC plasmid DNA with J3-Cascade, containing a targeting(J3) crRNA. pUC-λ, was mixed with 2-fold increasing amounts ofJ3-Cascade, from a pUC-λ:Cascade molar ratio of 1:0.5 up to a 1:256molar ratio. The first and last lanes contain only pUC-λ. B) Gel-shiftas in (A) with R44-Cascade containing a non-targeting (R44) cRNA. C)Gel-shift as in (A) with Nt.BspQI nicked pUC-λ. D) Gel-shift as in (A)with PdmI linearized pUC-λ. E) Fit of the fraction pUC-λ bound toJ3-Cascade plotted against the concentration of free J3-Cascade givesthe dissociation constant (Kd) for specific binding. F) Fit of thefraction pUC-λ bound to R44-Cascade plotted against the concentration offree R44-Cascade gives the dissociation constant (Kd) for non-specificbinding. G) Specific binding of Cascade to the protospacer monitored byrestriction analysis, using the unique BsmI restriction site in theprotospacer sequence. Lane 1 and 5 contain only pUC-λ. Lane 2 and 6contain pUC-λ mixed with Cascade. Lane 3 and 7 contain pUC-λ mixed withCascade and subsequent BsmI addition. Lane 4 and 8 contain pUC-λ mixedwith BsmI. H) Gel-shift of pUC-λ bound to Cascade with subsequentNt.BspQI cleavage of one strand of the plasmid. Lane 1 and 6 containonly pUC-λ. Lane 2 and 7 contain pUC-λ, mixed with Cascade. Lane 3 and 8contain pUC-λ mixed with Cascade and subsequent Nt.BspQI nicking. Lane 4and 9 contain pUC-λ mixed with Cascade, followed by addition of a ssDNAprobe complementary to the displaced strand in the R-loop and subsequentnicking with Nt.BspQI. Lane 5 and 10 contain pUC-λ nicked with Nt.BspQI.H) Gel-shift of pUC-λ bound to Cascade with subsequent Nt.BspQI nickingof the plasmid. Lane 1 and 6 contain only pUC-λ. Lane 2 and 7 containpUC-λ mixed with Cascade. Lane 3 and 8 contain pUC-λ mixed with Cascadeand subsequent Nt.BspQI cleavage. Lane 4 and 9 contain pUC-λ mixed withCascade, followed by addition of a ssDNA probe complementary to thedisplaced strand in the R-loop and subsequent cleavage with Nt.BspQI.Lane 5 and 10 contain pUC-λ cleaved with Nt.BspQI. I) Gel-shift of pUC-λbound to Cascade with subsequent EcoRI cleavage of both strands of theplasmid. Lane 1 and 6 contain only pUC-λ. Lane 2 and 7 contain pUC-λmixed with Cascade. Lane 3 and 8 contain pUC-λ mixed with Cascade andsubsequent EcoRI cleavage. Lane 4 and 9 contain pUC-λ mixed withCascade, followed by addition of a ssDNA probe complementary to thedisplaced strand in the R-loop and subsequent cleavage with EcoRI. Lane5 and 10 contain pUC-λ cleaved with EcoRI.

FIG. 2 shows scanning force micrographs demonstrating how Cascadeinduces bending of target DNA upon protospacer binding. A-P) Scanningforce microscopy images of nSC plasmid DNA with J3-Cascade containing atargeting (J3) crRNA. pUC-λ was mixed with J3-Cascade at a pUC-λ:Cascaderatio of 1:7. Each image shows a 500×500 nm surface area. White dotscorrespond to Cascade.

FIG. 3 shows how BiFC analysis reveals that Cascade and Cas3 interactupon target recognition. A) Venus fluorescence of cells expressingCascadeΔCse1 and CRISPR 7Tm, which targets 7 protospacers on the phagegenome, and Cse1-N155Venus and Cas3-C85Venus fusion proteins. B)Brightfield image of the cells in (A). C) Overlay of (A) and (B). D)Venus fluorescence of phage λ infected cells expressing CascadeΔCse1 andCRISPR 7Tm, and Cse1-N155Venus and Cas3-C85Venus fusion proteins. E)Brightfield image of the cells in (G). F) Overlay of (G) and (H). G)Venus fluorescence of phage λ infected cells expressing CascadeΔCse1 andnon-targeting CRISPR R44, and N155Venus and C85Venus proteins. H)Brightfield image of the cells in (J). I) Overlay of (J) and (K). J)Average of the fluorescence intensity of 4-7 individual cells of eachstrain, as determined using the profile tool of LSM viewer (Carl Zeiss).

FIG. 4 shows Cas3 nuclease and helicase activities duringCRISPR-interference. A) Competent BL21-AI cells expressing Cascade, aCas3 mutant and CRISPR J3 were transformed with pUC-λ. Colony formingunits per microgram pUC-λ (cfu/μg DNA) are depicted for each of thestrains expressing a Cas3 mutant. Cells expressing wt Cas3 and CRISPR J3or CRISPR R44 serve as positive and negative controls, respectively. B)BL21-AI cells carrying Cascade, Cas3 mutant, and CRISPR encodingplasmids as well as pUC-λ are grown under conditions that suppressexpression of the cas genes and CRISPR. At t=0 expression is induced.The percentage of cells that lost pUC-λ over time is shown, asdetermined by the ratio of ampicillin sensitive and ampicillin resistantcells.

FIG. 5 shows how a Cascade-Cas3 fusion complex provides in vivoresistance and has in vitro nuclease activity. A) Coomassie Blue stainedSDS-PAGE of purified Cascade and Cascade-Cas3 fusion complex. B)Efficiency of plaquing of phage λ on cells expressing Cascade-Cas3fusion complex and a targeting (J3) or non-targeting (R44) CRISPR and oncells expressing Cascade and Cas3 separately together with a targeting(J3) CRISPR. C) Gel-shift (in the absence of divalent metal ions) of nSCtarget plasmid with J3-Cascade-Cas3 fusion complex. pUC-λ was mixed with2-fold increasing amounts of J3-Cascade-Cas3, from apUC-λ:J3-Cascade-Cas3 molar ratio of 1:0.5 up to a 1:128 molar ratio.The first and last lane contain only pUC-λ. D) Gel-shift (in the absenceof divalent metal ions) of nSC non-target plasmid with J3-Cascade-Cas3fusion complex. pUC-p7 was mixed with 2-fold increasing amounts ofJ3-Cascade-Cas3, from a pUC-p7:J3-Cascade-Cas3 molar ratio of 1:0.5 upto a 1:128 molar ratio. The first and last lane contain only pUC-p7. E)Incubation of nSC target plasmid (pUC-λ, left) or nSC non-target plasmid(pUC-p7, right) with J3-Cascade-Cas3 in the presence of 10 mM MgCl₂.Lane 1 and 7 contain only plasmid. F) Assay as in (E) in the presence of2 mM ATP. G) Assay as in (E) with the mutant J3-Cascade-Cas3K320Ncomplex. H) Assay as in (G) in the presence of 2 mM ATP.

FIG. 6 is a schematic diagram showing a model of the CRISPR-interferencetype I pathway in E. coli.

FIG. 7 is a schematic diagram showing how a Cascade-FokI fusionembodiment of the invention is used to create FokI dimers which cutsdsDNA to produce blunt ends as part of a process of non-homologous endjoining or homologous recombination.

FIG. 8 shows how BiFC analysis reveals that Cascade and Cas3 interactupon target recognition. Overlay of Brightfield image and Venusfluorescence of cells expressing Cascade without Cse1, Cse1-N155Venusand Cas3-C85Venus and either CRISPR 7Tm, which targets 7 protospacers onthe phage Lambda genome, or the non-targeting CRISPR R44. Cellsexpressing CRISPR 7Tm are fluorescent only when infected with phageLambda, while cells expressing CRISPR R44 are non-fluorescent. Thehighly intense fluorescent dots (outside cells) are due tolight-reflecting salt crystals. White bars correspond to 10 micron.

FIG. 9 shows pUC-λ, sequences of 4 clones [SEQ ID NOs: 39-42] encodingCRISPR J3, Cascade and Cas3 (wt or S483AT485A) indicate that these areescape mutants carrying (partial) deletions of the protospacer orcarrying a single point mutation in the seed region, which explains theinability to cure these plasmids.

FIG. 10 shows sequence alignments of cas3 genes from organismscontaining the Type I-E CRISPR/Cas system. Alignment of cas3-cse1 genesfrom Streptomyces sp. SPB78 (1^(st) sequence, Accession Number:ZP_07272643.1) [SEQ ID NO: 43], in Streptomyces griseus (2^(nd)sequence, Accession Number YP_001825054) [SEQ ID NO: 44], and inCatenulispora acidiphila DSM 44928 (3^(rd) sequence, Accession NumberYP_003114638) [SEQ ID NO: 45] and an artificial E. coli Cas3-Cse1 fusionprotein [SEQ ID NO: 46] which includes the polypeptide linker sequencefrom S. griseus.

FIG. 11 shows the design of a Cascade^(KKR/ELD) nuclease pair in whichFokI nuclease domains are mutated such that only heterodimers consistingof KKR and ELD nuclease domains are and the distance between theopposing binding sites may be varied to determine the optimal distancebetween a Cascade nuclease pair.

FIG. 12 is a schematic diagram showing genome targeting by aCascade-FokI nuclease pair.

FIG. 13 shows an SDS PAGE gel of Cascade-nuclease complexes.

FIG. 14 shows electrophoresis gels of in vitro cleavage assays ofCascade^(KKR/ELD) on plasmid DNA.

FIG. 15 shows Cascade^(KKR/ELD) cleavage patterns and frequency [SEQ IDNO: 47].

EXAMPLES Materials and Methods Used Strains, Gene Cloning, Plasmids andVectors

E. coli BL21-AI and E. coli BL21 (DE3) strains were used throughout.Table 1 lists all plasmids used in this study. The previously describedpWUR408, pWUR480, pWUR404 and pWUR547 were used for production ofStrep-tag II R44-Cascade, and pWUR408, pWUR514 and pWUR630 were used forproduction of Strep-tag II J3-Cascade (Jore et al., (2011) NatureStructural & Molecular Biology 18, 529-536; Semenova et al., (2011)Proceedings of the National Academy of Sciences of the United States ofAmerica 108, 10098-10103.) pUC-λ (pWUR610) and pUC-p7 (pWUR613) havebeen described elsewhere (Jore et al., 2011; Semenova et al., 2011). TheC85Venus protein is encoded by pWUR647, which corresponds to pET52b(Novagen) containing the synthetic GA1070943 construct (Table 2)(Geneart) cloned between the BamHI and NotI sites. The N155Venus proteinis encoded by pWUR648, which corresponds to pRSF1b (Novagen) containingthe synthetic GA1070941 construct (Table 2) (Geneart) cloned between theNotI and XhoI sites. The Cas3-C85Venus fusion protein is encoded bypWUR649, which corresponds to pWUR647 containing the Cas3 amplificationproduct using primers BG3186 and BG3213 (Table 3) between the NcoI andBamHI sites. The CasA-N155Venus fusion protein is encoded by pWUR650,which corresponds to pWUR648 containing the CasA amplification productusing primers BG3303 and BG3212 (Table 3) between the NcoI and BamHIsites. CRISPR 7Tm is encoded by pWUR651, which corresponds topACYCDuet-1 (Novagen) containing the synthetic GA1068859 construct(Table 2) (Geneart) cloned between the NcoI and KpnI sites. The Cascadeencoding pWUR400, the CascadeΔCse1 encoding WUR401 and the Cas3 encodingpWUR397 were described previously (Jore et al., 2011). The Cas3H74Aencoding pWUR652 was constructed using site directed mutagenesis ofpWUR397 with primers BG3093, BG3094 (Table 3).

TABLE 1 Plasmids used Description and order Restriction Plasmids ofgenes (5′-3′) sites Primers Source pWUR397 cas3 in pRSF-1b, no 1 tagspWUR400 casA-casB-casC-casD- 1 casE in pCDF-1b, no tags pWUR401casB-casC-casD-casE 1 in pCDF-lb, no tags pWUR404 casE in pCDF-1b, no 1tags pWUR408 casA in pRSF-1b, no 1 tags pWUR480 casB with Strep-tag II 1(N-term)-casC-casD in pET52b pWUR514 casB with Strep-tag II 2(N-term)-casC-casD- CasE in pET52b pWUR547 E. coli R44 CRISPR, 7x 2spacer nr. 2, in pACYCDuet-1 pWUR613 pUC-p7; pUC19 2 containing R44-protospacer on a 350 bp phage P7 amplicon pWUR630 CRISPR poly J3, 5xNcoI/KpnI This spacer J3 in study pACYCDuet-1 pWUR610 pUC-λ;pUC19 3containing J3- protospacer on a 350 bp phage λ amplicon pWUR647C85Venus; GA1070943 BamHI/NotI This (Table S1) in pET52b study pWUR648N155Venus; NotI/XhoI This GA1070941 (Table S1) study in pRSF1b pWUR649cas3-C85Venus; NcoI/BamHI BG3186 + This pWUR647 containing BG3213 studycas3 amplicon pWUR650 casA-N155Venus NcoI/NotI BG3303 + This pWUR648containing BG3212 study casA amplicon pWUR651 CRISPR 7Tm; NcoI/KpnI ThisGA1068859 (Table S1) study in pACYCDuet-1 casB with Strep-tag II This(N-term)-casC-casD- study CasE in pCDF-1b cas3-casA fusion This studycas3H74A-CasA fusion This study cas3D75A-CasA fusion This studycas3K320N-CasA This fusion study cas3D452N-CasA This fusion study

-   Source 1 in the table above is Brouns et al (2008) Science 321,    960-964.-   Source 2 in the table above is Jore et al (2011) Nature Structural &    Molecular Biology 18: 529-537.

TABLE 2 Synthetic Constructs GA1070943 [SEQ ID NO: 6]ACTGGAAAGCGGGCAGTGAAAGGAAGGCCCATGAGGCCAGTTAATTAAGCGGATCCTGGCGGCGGCAGCGGCGGCGGCAGCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCGGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCTACCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAGCGGCCGCGGCGCGCCTAGGCCTTGACGGCCTTCCTTCAATTCGCCCT ATAGTGAG GA1070941[SEQ ID NO: 7] CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATTTAATTAAGCGGCCGCAGGCGGCGGCAGCGGCGGCGGCAGCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGCTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTCGGCTACGGCCTGCAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCACGGCCTAACTCGAGGGCGCGCCCTGGGCCTCATGGGCCTTCCGCTCACTGCCCGCTTTCCAG GA1068859 [SEQ ID NO: 8]CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATGAGCTCCATGGAAACAAAGAATTAGCTGATCTTTAATAATAAGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCTTTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGGGGATAAACCGGGCCGATTGAAGGTCCGGTGGATGGCTTAAAAGAGTTCCCCGCGCCAGCGGGGATAAACCGCCGCAGGTACAGCAGGTAGCGCAGATCATCAAGAGTTCCCCGCGCCAGCGGGGATAAACCGACTTCTCTCCGAAAAGTCAGGACGCTGTGGCAGAGTTCCCCGCGCCAGCGGGGATAAACCGCCTACGCGCTGAACGCCAGCGGTGTGGTGAATGAGTTCCCCGCGCCAGCGGGGATAAACCGGTGTGGCCATGCACGCCTTTAACGGTGAACTGGAGTTCCCCGCGCCAGCGGGGATAAACCGCACGAACTCAGCCAGAACGACAAACAAAAGGCGAGTTCCCCGCGCCAGCGGGGATAAACCGGCACCAGTACGCGCCCCACGCTGACGGTTTCTGAGTTCCCCGCGCCAGCGGGGATAAACCGCAGCTCCCATTTTCAAACCCAGGTACCCTGGGCCTCATGGGCCTTCCGCTCACTGCCCGC TTTCCAG GA1047360[SEQ ID NO: 9] GAGCTCCCGGGCTGACGGTAATAGAGGCACCTACAGGCTCCGGTAAAACGGAAACAGCGCTGGCCTATGCTTGGAAACTTATTGATCAACAAATTGCGGATAGTGTTATTTTTGCCCTCCCAACACAAGCTACCGCGAATGCTATGCTTACGAGAATGGAAGCGAGCGCGAGCCACTTATTTTCATCCCCAAATCTTATTCTTGCTCATGGCAATTCACGGTTTAACCACCTCTTTCAATCAATAAAATCACGCGCGATTACTGAACAGGGGCAAGAAGAAGCGTGGGTTCAGTGTTGTCAGTGGTTGTCACAAAGCAATAAGAAAGTGTTTCTTGGGCAAATCGGCGTTTGCACGATTGATCAGGTGTTGATTTCGGTATTGCCAGTTAAACACCGCTTTATCCGTGGTTTGGGAATTGGTAGATCTGTTTTAATTGTTAATGAAGTTCATGCTTACGACACCTATATGAACGGCTTGCTCGAGGCAGTGCTCAAGGCTCAGGCTGATGTGGGAGGGAGTGTTATTCTTCTTTCCGCAACCCTACCAATGAAACAAAAACAGAAGCTTCTGGATACTTATGGTCTGCATACAGATCCAGTGGAAAATAACTCCGCATATCCACTCATTAACTGGCGAGGTGTGAATGGTGCGCAACGTTTTGATCTGCTAG CGGATCCGGTACC

TABLE 3 Primers BG3186 ATAGCGCCATGGAACCTTTTAAATATATATGCCATTA[SEQ ID NO: 10] BG3213 ACAGTGGGATCCGCTTTGGGATTTGCAGGGATGACTCTGGT[SEQ ID NO: 11] BG3303 ATAGCGTCATGAATTTGCTTATTGATAACTGGATTCCTGTACG[SEQ ID NO: 12] BG3212 ACAGTGGCGGCCGCGCCATTTGATGGCCCTCCTTGCGGTTTTAA[SEQ ID NO: 13] BG3076 CGTATATCAAACTTTCCAATAGCATGAAGAGCAATGAAAAATAAC[SEQ ID NO: 14] BG3449 ATGATACCGCGAGACCCACGCTC [SEQ ID NO: 15] BG3451CGGATAAAGTTGCAGGACCACTTC [SEQ ID NO: 16]

Protein Production and Purification

Cascade was expressed and purified as described (Jore et al., 2011).Throughout purification a buffer containing 20 mM HEPES pH 7.5, 75 mMNaCl, 1 mM DTT, 2 mM EDTA was used for resuspension and washing. Proteinelution was performed in the same buffer containing 4 mM desthiobiotin.The Cascade-Cas3 fusion complex was expressed and purified in the samemanner, with washing steps being performed with 20 mM HEPES pH 7.5, 200mM NaCl and 1 mM DTT, and elution in 20 mM HEPES pH 7.5, 75 mM NaCl, 1mM DTT containing 4 mM desthiobiotin.

Electrophoretic Mobility Shift Assay

Purified Cascade or Cascade subsomplexes were mixed with pUC-λ in abuffer containing 20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 2 mM EDTA,and incubated at 37° C. for 15 minutes. Samples were run overnight on a0.8% TAE Agarose gel and post-stained with SybR safe (Invitrogen)1:10000 dilution in TAE for 30 minutes. Cleavage with BsmI (Fermentas)or Nt.BspQI (New England Biolabs) was performed in the HEPES reactionbuffer supplemented with 5 mM MgCl₂.

Scanning Force Microscopy

Purified Cascade was mixed with pUC-λ(at a ratio of 7:1, 250 nM Cascade,35 nM DNA) in a buffer containing 20 mM HEPES pH 7.5, 75 mM NaCl, 0.2 mMDTT, 0.3 mM EDTA and incubated at 37° C. for 15 minutes. Subsequently,for AFM sample preparation, the incubation mixture was diluted 10× indouble distilled water and MgCl₂ was added at a final concentration of1.2 mM. Deposition of the protein-DNA complexes and imaging was carriedout as described before (Dame et al., (2000) Nucleic Acids Res. 28:3504-3510).

Fluorescence Microscopy

BL21-AI cells carrying CRISPR en cas gene encoding plasmids, were grownovernight at 37° C. in Luria-Bertani broth (LB) containing ampicillin(100 μg/ml), kanamycin (50 μg/ml), streptomycin (50 μg/ml) andchloramphenicol (34 μg/ml). Overnight culture was diluted 1:100 in freshantibiotic-containing LB, and grown for 1 hour at 37° C. Expression ofcas genes and CRISPR was induced for 1 hour by adding L-arabinose to afinal concentration of 0.2% and IPTG to a final concentration of 1 mM.For infection, cells were mixed with phage Lambda at a Multiplicity ofInfection (MOI) of 4. Cells were applied to poly-L-lysine coveredmicroscope slides, and analyzed using a Zeiss LSM510 confocal laserscanning microscope based on an Axiovert inverted microscope, with a 40×oil immersion objective (N.A. of 1.3) and an argon laser as theexcitation source (514 nm) and detection at 530-600 nm. The pinhole wasset at 203 μm for all measurements.

pUC-λ Transformation Studies

LB containing kanamycin (50 μg/ml), streptomycin (50 μg/ml) andchloramphenicol (34 μg/ml) was inoculated from an overnight pre-inoculumand grown to an OD₆₀₀ of 0.3. Expression of cas genes and CRISPR wasinduced for 45 minutes with 0.2% L-arabinose and 1 mM IPTG. Cells werecollected by centrifugation at 4° C. and made competent by resuspensionin ice cold buffer containing 100 mM RbCl₂, 50 mM MnCl₂, 30 mM potassiumacetate, 10 mM CaCl₂ and 15% glycerol, pH 5.8. After a 3 hourincubation, cells were collected and resuspended in a buffer containing10 mM MOPS, 10 mM RbCl, 75 mM CaCl₂, 15% glycerol, pH 6.8.Transformation was performed by adding 80 ng pUC-λ, followed by a 1minute heat-shock at 42° C., and 5 minute cold-shock on ice. Next cellswere grown in LB for 45 minutes at 37° C. before plating on LB-agarplates containing 0.2% L-arabinose, 1 mM IPTG, ampicillin (100 μg/ml),kanamycin (50 μg/ml), streptomycin (50 μg/ml) and chloramphenicol (34μg/ml).

Plasmid curing was analyzed by transforming BL21-AI cells containing casgene and CRISPR encoding plasmids with pUC-λ, while growing the cells inthe presence of 0.2% glucose to suppress expression of the T7-polymerasegene. Expression of cas genes and CRISPR was induced by collecting thecells and re-suspension in LB containing 0.2% arabinose and 1 mM IPTG.Cells were plated on LB-agar containing either streptomycin, kanamycinand chloramphenicol (non-selective for pUC-λ) or ampicillin,streptomycin, kanamycin and chloramphenicol (selective for pUC-λ). Afterovernight growth the percentage of plasmid loss can be calculated fromthe ratio of colony forming units on the selective and non-selectiveplates.

Phage Lambda Infection Studies

Host sensitivity to phage infection was tested using a virulent phageLambda (λ_(vir)), as in (Brouns et al (2008) Science 321, 960-964.). Thesensitivity of the host to infection was calculated as the efficiency ofplaquing (the plaque count ratio of a strain containing an anti-λ CRISPRto that of the strain containing a non-targeting R44 CRISPR) asdescribed in Brouns et al (2008).

Example 1 Cascade Exclusively Binds Negatively Supercoiled Target DNA

The 3 kb pUC19-derived plasmid denoted pUC-λ, contains a 350 bp DNAfragment corresponding to part of the J gene of phage λ, which istargeted by J3-Cascade (Cascade associated with crRNA containing spacerJ3 (Westra et al (2010) Molecular Microbiology 77, 1380-1393). Theelectrophoretic mobility shift assays show that Cascade has highaffinity only for negatively supercoiled (nSC) target plasmid. At amolar ratio of J3-Cascade to pUC-λ of 6:1 all nSC plasmid was bound byCascade, (see FIG. 1A), while Cascade carrying the non-targeting crRNAR44 (R44-Cascade) displayed non-specific binding at a molar ratio of128:1 (see FIG. 1B). The dissociation constant (Kd) of nSC pUC-λ wasdetermined to be 13±1.4 nM for J3-Cascade (see FIG. 1E) and 429±152 nMfor R44-Cascade (see FIG. 1F).

J3-Cascade was unable to bind relaxed target DNA with measurableaffinity, such as nicked (see FIG. 1C) or linear pUC-λ (see FIG. 1D),showing that Cascade has high affinity for larger DNA substrates with anSC topology.

To distinguish non-specific binding from specific binding, the BsmIrestriction site located within the protospacer was used. While addingBsmI enzyme to pUC-λ gives a linear product in the presence ofR44-Cascade (see FIG. 1G, lane 4), pUC-λ is protected from BsmI cleavagein the presence of J3-Cascade (see FIG. 1G, lane 7), indicating specificbinding to the protospacer. This shows that Cas3 is not required for invitro sequence specific binding of Cascade to a protospacer sequence ina nSC plasmid.

Cascade binding to nSC pUC-λ was followed by nicking with Nt.BspQI,giving rise to an OC topology. Cascade is released from the plasmidafter strand nicking, as can be seen from the absence of a mobilityshift (see FIG. 1H, compare lane 8 to lane 10). In contrast, Cascaderemains bound to its DNA target when a ssDNA probe complementary to thedisplaced strand is added to the reaction before DNA cleavage byNt.BspQI (see FIG. 1H, lane 9). The probe artificially stabilizes theCascade R-loop on relaxed target DNA. Similar observations are made whenboth DNA strands of pUC-λ are cleaved after Cascade binding (see FIG.1I, lane 8 and lane 9).

Example 2 Cascade Induces Bending of Bound Target DNA

Complexes formed between purified Cascade and pUC-λ were visualized.Specific complexes containing a single bound J3-Cascade complex wereformed, while unspecific R44-Cascade yields no DNA bound complexes inthis assay under identical conditions. Out of 81 DNA molecules observed76% were found to have J3-Cascade bound (see FIGS. 2A-P). Of thesecomplexes in most cases Cascade was found at the apex of a loop (86%),whereas a small fraction only was found at non-apical positions (14%).These data show that Cascade binding causes bending and possiblywrapping of the DNA, probably to facilitate local melting of the DNAduplex.

Example 3 Naturally Occurring Fusions of Cas3 and Cse1: Cas3 Interactswith Cascade Upon Protospacer Recognition

Figure S3 shows sequence analysis of cas3 genes from organismscontaining the Type I-E CRISPR/Cas system reveals that Cas3 and Cse1occur as fusion proteins in Streptomyces sp. SPB78 (Accession Number: ZP07272643.1), in Streptomyces griseus (Accession Number YP_001825054),and in Catenulispora acidiphila DSM 44928 (Accession NumberYP_003114638).

Example 4 Bimolecular Fluorescence Complementation (BiFC) Shows how aCse1 Fusion Protein Forming Part of Cascade Continues to Interact withCas3

BiFC experiments were used to monitor interactions between Cas3 andCascade in vivo before and after phage λ infection. BiFC experimentsrely on the capacity of the non-fluorescent halves of a fluorescentprotein, e.g., Yellow Fluorescent Protein (YFP) to refold and to form afluorescent molecule when the two halves occur in close proximity. Assuch, it provides a tool to reveal protein-protein interactions, sincethe efficiency of refolding is greatly enhanced if the localconcentrations are high, e.g., when the two halves of the fluorescentprotein are fused to interaction partners. Cse1 was fused at theC-terminus with the N-terminal 155 amino acids of Venus(Cse1-N155Venus), an improved version of YFP (Nagai et al (2002) NatureBiotechnology 20, 87-90). Cas3 was C-terminally fused to the C-terminal85 amino acids of Venus (Cas3-C85Venus).

BiFC analysis reveals that Cascade does not interact with Cas3 in theabsence of invading DNA (FIG. 3ABC, FIG. 3P and FIG. 8). Upon infectionwith phage λ, however, cells expressing CascadeΔCse1, Cse1-N155Venus andCas3-C85Venus are fluorescent if they co-express the anti-2, CRISPR 7Tm(FIG. 3DEF, FIG. 3P and FIG. 8). When they co-express a non-targetingCRISPR R44 (FIG. 3GHI, FIG. 3P and FIG. 8), the cells remainnon-fluorescent. This shows that Cascade and Cas3 specifically interactduring infection upon protospacer recognition and that Cse1 and Cas3 arein close proximity of each other in the Cascade-Cas3 binary effectorcomplex.

These results also show quite clearly that a fusion of Cse1 with anheterologous protein does not disrupt the ribonucleoprotein formation ofCascade and crRNA, nor does it disrupt the interaction of Cascade andCas3 with the target phage DNA, even when the Cas3 itself is also afusion protein.

Example 5 Preparing a Designed Cas3-Cse1 Fusion Gives a Protein with InVivo Functional Activity

Providing in vitro evidence for Cas3 DNA cleavage activity requiredpurified and active Cas3. Despite various solubilization strategies,Cas3 overproduced (Howard et al (2011) Biochem. J. 439, 85-95) in E.coli BL21 is mainly present in inactive aggregates and inclusion bodies.Cas3 was therefore produced as a Cas3-Cse1 fusion protein, containing alinker identical to that of the Cas3-Cse1 fusion protein in S. griseus(see FIG. 10). When co-expressed with CascadeΔCse1 and CRISPR J3, thefusion-complex was soluble and was obtained in high purity with the sameapparent stoichiometry as Cascade (FIG. 5A). When functionality of thiscomplex was tested for providing resistance against phage λ infection,the efficiency of plaquing (eop) on cells expressing the fusion-complexJ3-Cascade-Cas3 was identical as on cells expressing the separateproteins (FIG. 5B).

Since the J3-Cascade-Cas3 fusion-complex was functional in vivo, invitro DNA cleavage assays were carried out using this complex. WhenJ3-Cascade-Cas3 was incubated with pUC-λ in the absence of divalentmetals, plasmid binding was observed at molar ratios similar to thoseobserved for Cascade (FIG. 5C), while a-specific binding to a non-targetplasmid (pUC-p7, a pUC19 derived plasmid of the same size as pUC-λ, butlacking a protospacer) occurred only at high molar ratios (FIG. 5D),indicating that a-specific DNA binding of the complex is also similar tothat of Cascade alone.

Interestingly, the J3-Cascade-Cas3 fusion complex displays magnesiumdependent endonuclease activity on nSC target plasmids. In the presenceof 10 mM Mg²⁺ J3-Cascade-Cas3 nicks nSC pUC-λ (FIG. 5E, lane 3-7), butno cleavage is observed for substrates that do not contain the targetsequence (FIG. 5E, lane 9-13), or that have a relaxed topology. No shiftof the resulting OC band is observed, in line with previous observationsthat Cascade dissociates spontaneously after cleavage, without requiringATP-dependent Cas3 helicase activity. Instead, the helicase activity ofCas3 appears to be involved in exonucleolytic plasmid degradation. Whenboth magnesium and ATP are added to the reaction, full plasmiddegradation occurred (FIG. 5H).

The inventors have found that Cascade alone is unable to bindprotospacers on relaxed DNA. In contrast, the inventors have found thatCascade efficiently locates targets in negatively supercoiled DNA, andsubsequently recruits Cas3 via the Cse1 subunit. Endonucleolyticcleavage by the Cas3 HD-nuclease domain causes spontaneous release ofCascade from the DNA through the loss of supercoiling, remobilizingCascade to locate new targets. The target is then progressively unwoundand cleaved by the joint ATP-dependent helicase activity and HD-nucleaseactivity of Cas3, leading to complete target DNA degradation andneutralization of the invader.

Referring to FIG. 6 and without wishing to be bound to any particulartheory, a mechanism of operation for the CRISPR-interference type Ipathway in E. coli may involve (1) First, Cascade carrying a crRNA scansthe nSC plasmid DNA for a protospacer, with adjacent PAM. Whether duringthis stage strand separation occurs is unknown. (2) Sequence specificprotospacer binding is achieved through basepairing between the crRNAand the complementary strand of the DNA, forming an R-loop. Uponbinding, Cascade induces bending of the DNA. (3) The Cse1 subunit ofCascade recruits Cas3 upon DNA binding. This may be achieved by Cascadeconformational changes that take place upon nucleic acid binding. (4)The HD-domain (darker part) of Cas3 catalyzes Mg²⁺-dependent nicking ofthe displaced strand of the R-loop, thereby altering the topology of thetarget plasmid from nSC to relaxed OC. (5a and 5b) The plasmidrelaxation causes spontaneous dissociation of Cascade. Meanwhile Cas3displays ATP-dependent exonuclease activity on the target plasmid,requiring the helicase domain for target dsDNA unwinding and theHD-nuclease domain for successive cleavage activity. (6) Cas3 degradesthe entire plasmid in an ATP-dependent manner as it processively movesalong, unwinds and cleaves the target dsDNA.

Example 6 Preparation of Artificial Cas-Strep Tag Fusion Proteins andAssembly of Cascade Complexes

Cascade complexes are produced and purified as described in Brouns et al(2008) Science 321: 960-4 (2008), using the expression plasmids listedin Supplementary Table 3 of Jore et al (2011) Nature Structural &Molecular Biology 18: 529-537. Cascade is routinely purified with anN-terminal Strep-tag II fused to CasB (or CasC in CasCDE). Sizeexclusion chromatography (Superdex 200 HR 10/30 (GE)) is performed using20 mM Tris-HCl (pH 8.0), 0.1 M NaCl, 1 mM dithiotreitol. Cascadepreparations (˜0.3 mg) are incubated with DNase I (Invitrogen) in thepresence of 2.5 mM MgCl₂ for 15 min at 37° C. prior to size exclusionanalysis. Co-purified nucleic acids are isolated by extraction using anequal volume of phenol:chloroform:isoamylalcohol (25:24:1) pH 8.0(Fluka), and incubated with either DNase I (Invitrogen) supplementedwith 2.5 mM MgCl₂ or RNase A (Fermentas) for 10 min at 37° C. Cassubunit proteins fused to the amino acid sequence of Strep-Tag areproduced.

Plaque assays showing the biological activity of the Strep-Tag Cascadesubunits are performed using bacteriophage Lambda and the efficiency ofplaquing (EOP) was calculated as described in Brouns et al (2008).

For purification of crRNA, samples are analyzed by ion-pairreversed-phased-HPLC on an Agilent 1100 HPLC with UV_(260nm) detector(Agilent) using a DNAsep column 50 mm×4.6 mm I. D. (Transgenomic, SanJose, Calif.). The chromatographic analysis is performed using thefollowing buffer conditions: A) 0.1 M triethylammonium acetate (TEAA)(pH 7.0) (Fluka); B) buffer A with 25% LC MS grade acetonitrile (v/v)(Fisher). crRNA is obtained by injecting purified intact Cascade at 75°C. using a linear gradient starting at 15% buffer B and extending to 60%B in 12.5 min, followed by a linear extension to 100% B over 2 min at aflow rate of 1.0 ml/min. Hydrolysis of the cyclic phosphate terminus wasperformed by incubating the HPLC-purified crRNA in a final concentrationof 0.1 M HCl at 4° C. for 1 hour. The samples are concentrated to 5-10μl on a vacuum concentrator (Eppendorf) prior to ESI-MS analysis.

Electrospray Ionization Mass spectrometry analysis of crRNA is performedin negative mode using an UHR-TOF mass spectrometer (maXis) or an HCTUltra PTM Discovery instrument (both Bruker Daltonics), coupled to anonline capillary liquid chromatography system (Ultimate 3000, Dionex,UK). RNA separations are performed using a monolithic (PS-DVB) capillarycolumn (200 μm×50 mm I.D., Dionex, UK). The chromatography is performedusing the following buffer conditions: C) 0.4 M1,1,1,3,3,3,-Hexafluoro-2-propanol (HFIP, Sigma-Aldrich) adjusted withtriethylamine (TEA) to pH 7.0 and 0.1 mM TEAA, and D) buffer C with 50%methanol (v/v) (Fisher). RNA analysis is performed at 50° C. with 20%buffer D, extending to 40% D in 5 min followed by a linear extension to60% D over 8 min at a flow rate of 2 μl/min.

Cascade protein is analyzed by native mass spectrometry in 0.15 Mammonium acetate (pH 8.0) at a protein concentration of 5 μM. Theprotein preparation is obtained by five sequential concentration anddilution steps at 4° C. using a centrifugal filter with a cut-off of 10kDa (Millipore). Proteins are sprayed from borosilicate glasscapillaries and analyzed on a LCT electrospray time-of-flight ormodified quadrupole time-of-flight instruments (both Waters, UK)adjusted for optimal performance in high mass detection (see Tahallah Net al (2001) Rapid Commun Mass Spectrom 15: 596-601 (2001) and van denHeuvel, R. H. et al. Anal Chem 78: 7473-83 (2006). Exact massmeasurements of the individual Cas proteins were acquired underdenaturing conditions (50% acetonitrile, 50% MQ, 0.1% formic acid).Sub-complexes in solution were generated by the addition of 2-propanolto the spray solution to a final concentration of 5% (v/v). Instrumentsettings were as follows; needle voltage ˜1.2 kV, cone voltage ˜175 V,source pressure 9 mbar. Xenon was used as the collision gas for tandemmass spectrometric analysis at a pressure of 1.5 10⁻² mbar. Thecollision voltage varied between 10-200 V.

Electrophoretic mobility shift assays (EMSA) are used to demonstrate thefunctional activity of Cascade complexes for target nucleic acids. EMSAis performed by incubating Cascade, CasBCDE or CasCDE with 1 nM labellednucleic acid in 50 mM Tris-Cl pH 7.5, 100 mM NaCl. Salmon sperm DNA(Invitrogen) is used as competitor. EMSA reactions are incubated at 37°C. for 20-30 min prior to electrophoresis on 5% polyacrylamide gels. Thegels are dried and analyzed using phosphor storage screens and a PMIphosphor imager (Bio-Rad). Target DNA binding and cleavage activity ofCascade is tested in the presence of 1-10 mM Ca, Mg or Mn-ions.

DNA targets are gel-purified long oligonucleotides (Isogen Life Sciencesor Biolegio), listed in Supplementary Table 3 of Jore et al (2011). Theoligonucleotides are end-labeled using γ³²P-ATP (PerkinElmer) and T4kinase (Fermentas). Double-stranded DNA targets are prepared byannealing complementary oligonucleotides and digesting remaining ssDNAwith Exonuclease I (Fermentas). Labelled RNA targets are in vitrotranscribed using T7 Maxiscript or T7 Mega Shortscript kits (Ambion)with α³²P-CTP (PerkinElmer) and removing template by DNase I (Fermentas)digestion. Double stranded RNA targets are prepared by annealingcomplementary RNAs and digesting surplus ssRNA with RNase T1(Fermentas), followed by phenol extraction.

Plasmid mobility shift assays are performed using plasmid pWUR613containing the R44 protospacer. The fragment containing the protospaceris PCR-amplified from bacteriophage P7 genomic DNA using primers BG3297and BG 3298 (see Supplementary Table 3 of Jore et al (2011). Plasmid(0.4 μg) and Cascade were mixed in a 1:10 molar ratio in a buffercontaining 5 mM Tris-HCl (pH 7.5) and 20 mM NaCl and incubated at 37° C.for 30 minutes.

Cascade proteins were then removed by proteinase K treatment (Fluka)(0.15 U, 15 min, 37° C.) followed by phenol/chloroform extraction.RNA-DNA complexes were then treated with RNaseH (Promega) (2 U, 1 h, 37°C.).

Strep-Tag-Cas protein subunit fusions which form Cascade proteincomplexes or active sub-complexes with the RNA component (equivalent toa crRNA), have the expected biological and functional activity ofscanning and specific attachment and cleavage of nucleic acid targets.Fusions of the Cas subunits with the amino acid chains of fluorescentdyes also form Cascade complexes and sub-complexes with the RNAcomponent (equivalent to crRNA) which retains biological and functionalactivity and allows visualisation of the location of a target nucleicacid sequence in ds DNA for example.

Example 7 A Cascade-Nuclease Pair and Test of Nuclease Activity In Vitro

Six mutations designated “Sharkey” have been introduced by randommutagenesis and screening to improve nuclease activity and stability ofthe non-specific nuclease domain from Flavobacterium okeanokoitesrestriction enzyme FokI (see Guo, J., et al. (2010) J. Mol. Biol. 400:96-107). Other mutations have been introduced that reduce off-targetcleavage activity. This is achieved by engineering electrostaticinteractions at the FokI dimer interface of a ZFN pair, creating oneFokI variant with a positively charged interface (KKR, E490K, I538K,H537R) and another with a negatively charged interface (ELD, Q486E,I499L, N496D) (see Doyon, Y., et al. (2011) Nature Methods 8: 74-9).Each of these variants is catalytically inactive as a homodimer, therebyreducing the frequency of off-target cleavage.

Cascade-Nuclease Design

We translationally fused improved FokI nucleases to the N-terminus ofCse1 to generate variants of Cse1 being FokI^(KKR)-Cse1 andFokI^(ELD)-Cse1, respectively. These two variants are co-expressed withCascade subunits (Cse2, Cas7, Cas5 and Cas6e), and one of two distinctCRISPR plasmids with uniform spacers. This loads the Cascade^(KKR)complex with uniform P7-crRNA, and the Cascade^(ELD) complex withuniform M13 g8-crRNA. These complexes are purified using theN-terminally StrepII-tagged Cse2 as described in Jore, M. M., et al.,(2011) Nat. Struct. Mol. Biol. 18(5): 529-536. Furthermore an additionalpurification step can be carried out using an N-terminally HIS-taggedFokI, to ensure purifying full length and intact Cascade-nuclease fusioncomplexes.

The nucleotide and amino acid sequences of the fusion proteins used inthis example were as follows:

>nucleotide sequence of FokI-(Sharkey-ELD)-Cse1 [SEQ ID NO: 18]ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGGAACGTTATGTGGAAGAAAATCAGACCCGTGATAAACATCTGAATCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCATATTACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaaagtccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatatggaactggccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgcagaacatccattatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtatcaaaaacaatttcctaatgaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggccccatccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtcctatgaattgattatggggggatatcgtaataatcaagcatctattcttgaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtactggcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of FokI-(Sharkey-ELD)-Cse1[SEQ ID NO: 19]MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMERYVEENQTRDKHLNPNEWWKVYPSSVTEEKELEVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKENNGEINEADPTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGEGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGELKEKETFTVNGLWPHPHSPCLVTVKKGEVEEKELAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELEVIGGYRNNQASILERRHDVLMENQGWQQYGNVINEIVTVGLGYKTALRKALYTEAEGEKNKDEKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG* >nucleotide sequence of FokI-(Sharkey -KKR)-Cse1[SEQ ID NO: 20]ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGCAGCGTTATGTGAAAGAAAATCAGACCCGCAACAAACATATTAACCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCGTAAAACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaaagtccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatatggaactggccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgcagaacatccctttatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatgaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggccccatccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagcatctattcttgaacggcgtcatgatgtgttgatgttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtactggcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of FokI-(Sharkey-KKR)-Cse1[SEQ ID NO: 21]MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMQRYVKENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNRKTNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKENNGEINFADPTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGEGGGEKSGLRGGTPVTTEVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGELKEKETFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERRHDVLMENQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGEKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG*>nucleotide sequence of His₆-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1(“His₆” disclosed as SEQ ID NO: 48) [SEQ ID NO: 22]ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAAGAAAAAACGTAAAGTTGAAGATCCGAAAGACATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGCAGCGTTATGTGAAAGAAAATCAGACCCGCAACAAACATATTAACCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCGTAAAACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaaagtccaaatcataaatattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgcagaacatccctttatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatgaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggcccatccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagcatctattcttgaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtactggcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatggctga>protein sequence of His₆-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1(“His₆” disclosed as SEQ ID NO: 48) [SEQ ID NO: 23]MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMQRYVKENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNRKTNVNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFADPTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERHHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG*>nucleotide sequence of His₆-Dual-monopartite NLS SV40-FokI (Sharkey-ELD)-Cse1(“His₆” disclosed as SEQ ID NO: 48) [SEQ ID NO: 24]ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAAGAAAAAACGTAAAGTTGAAGATCCGAAAGACATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCATGAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTTTATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAGCCCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGAAATGGAACGTTATGTGGAAGAAAATCAGACCCGTGATAAACATCTGAATCCGAATGAATGGTGGAAAGTTTATCCGAGCAGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAATCATATTACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCACCCTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGGCCTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaaagtccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatatggaactggccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcgcataatgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgcagaacatccctttatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgttattcaaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgttcgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatgaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattgggtttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgctgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggccccatccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcaccatcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaattcagaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagcatctattcttgaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtactggcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgctatttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctatacaaacatttacgggagttaaaaccgcaaggagggccatcaaatggctga>protein sequence of His₆-Dual-monopartite NLS SV40-FokI-(Sharkey-ELD)-Cse1(“His₆” disclosed as SEQ ID NO: 48) [SEQ ID NO: 25]MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMERYVEENQTRDKHLNPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFADPTNRAKGLEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG*

DNA Cleavage Assay

The specificity and activity of the complexes was tested using anartificially constructed target plasmid as a substrate. This plasmidcontains M13 and P7 binding sites on opposing strands such that bothFokI domains face each other (see FIG. 11). The distance between theCascade binding sites varies between 25 and 50 basepairs with 5 bpincrements. As the binding sites of Cascade need to be flanked by any offour known PAM sequences (5′-protospacer-CTT/CAT/CTC/CCT-3′ thisdistance range gives sufficient flexibility to design such a pair foralmost any given sequence.

The sequences of the target plasmids used are as follows. The numberindicated the distance between the M13 and P7 target sites. Protospacersare shown in bold, PAMs underlined:

Sequences of the target plasmids. The number indicates the distancebetween the M13 and P7 target sites. (protospacers in bold, PAMsunderlined)

>50 bp [SEQ ID NO: 26] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTGCTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAG GCCTCGTTCCGAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATA GGCGGCCTTTAACTCggatcc >45 bp[SEQ ID NO: 27] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCC TCGTTCAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGG CCTTTAACTCggatcc >40 bp[SEQ ID NO: 28] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTC GAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTT AACTCggatcc >35 bp[SEQ ID NO: 29] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTAAG CTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTC ggatcc >30 bp[SEQ ID NO: 30] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTGCTAGCTCTAGAACTAGTCCTCAGCCTAGGAAG CTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatc c >25 bp[SEQ ID NO: 31] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACAATACCGT CTTCTCTAGAACTAGTCCTCAGCCTAGGAAG CTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatcc

Cleavage of the target plasmids was analysed on agarose gels, wherenegatively supercoiled (nSC) plasmid can be distinguished fromlinearized- or nicked plasmid. The cleavage site of theCascade^(KKR/ELD) pair in a target vector was determined by isolatinglinear cleavage products from an agarose gel and filling in the recessed3′ ends left by FokI cleavage with the Klenow fragment of E. coli DNApolymerase to create blunt ends. The linear vector was self-ligated,transformed, amplified, isolated and sequenced. Filling in of recessed3′ ends and re-ligation will lead to extra nucleotides in the sequencethat represents the overhang left by FokI cleavage. By aligning thesequence reads to the original sequence, the cleavage sites can be foundon a clonal level and mapped. Below, the additional bases incorporatedinto the sequence after filling in recessed 3′ ends left by FokIcleavage are underlined:

Reading from top to bottom, the 5′-3′ sequences above are SEQ ID NOs:32-35, respectively.

Cleavage of a Target Locus in Human Cells

The human CCR5 gene encodes the C—C chemokine receptor type 5 protein,which serves as the receptor for the human immunodeficiency virus (HIV)on the surface of white blood cells. The CCR5 gene is targeted using apair of Cascade^(KKR/ELD) nucleases in addition to an artificial GFPlocus. A suitable binding site pair is selected on the coding region ofCCR5. Two separate CRISPR arrays containing uniform spacers targetingeach of the binding sites are constructed using DNA synthesis (Geneart).

The human CCR5 target gene selection and CRISPR designs used are asfollows:

>Part of genomic human CCR5 sequence, containingwhole ORF (position 347-1446). [SEQ ID NO: 36]GGTGGAACAAGATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAATTATTATACATCGGAGCCCTGCCAAAAAATCAATGTGAAGCAAATCGCAGCCCGCCTCCTGCCTCCGCTCTACTCACTGGTGTTCATCTTTGGTTTTGTGGGCAACATGCTGGTCATCCTCATCCTGATAAACTGCAAAAGGCTGAAGAGCATGACTGACATCTACCTGCTCAACCTGGCCATCTCTGACCTGTTTTTCCTTCTTACTGTCCCCTTCTGGGCTCACTATGCTGCCGCCCAGTGGGACTTTGGAAATACAATGTGTCAACTCTTGACAGGGCTCTATTTTATAGGCTTCTTCTCTGGAATCTTCTTCATCATCCTCCTGACAATCGATAGGTACCTGGCTGTCGTCCATGCTGTGTTTG CTTTAAAAGCCAGGACGGTCACCTTTGGGGTGGTGACAAG TGTGATCACTTGGGTGGTGGCTGTGTTTGCGTCTCTCCCAGGAATCATCTTTACCAGATCTCAAAAAGAAGGTCTTCATTACACCTGCAGCTCTCATTTTCCATACAGTCAGTATCAATTCTGGAAGAATTTCCAGACATTAAAGATAGTCATCTTGGGGCTGGTCCTGCCGCTGCTTGTCATGGTCATCTGCTACTCGGGAATCCTAAAAACTCTGCTTCGGTGTCGAAATGAGAAGAAGAGGCACAGGGCTGTGAGGCTTATCTTCACCATCATGATTGTTTATTTTCTCTTCTGGGCTCCCTACAACATTGTCCTTCTCCTGAACACCTTCCAGGAATTCTTTGGCCTGAATAATTGCAGTAGCTCTAACAGGTTGGACCAAGCTATGCAGGTGACAGAGACTCTTGGGATGACGCACTGCTGCATCAACCCCATCATCTATGCCTTTGTCGGGGAGAAGTTCAGAAACTACCTCTTAGTCTTCTTCCAAAAGCACATTGCCAAACGCTTCTGCAAATGCTGTTCTATTTTCCAGCAAGAGGCTCCCGAGCGAGCAAGCTCAGTTTACACCCGATCCACTGGGGAGCAGGAAATATCTGTGGGCTTGTGACACGGACTCAAGTGGGCTGGTGACCCAGTCRed1/2: chosen target sites (distance: 34 bp, PAM5′-CTT-3′). “Red 1 is first appearing underlinedsequence in the above. Red2 is the secondunderlined sequence. >CRISPR array red1 (italics = spacers, bold =repeats) [SEQ ID NO: 37]ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAATAAGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCTTTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGG GGATAAACCGCAAACACAGCATGGACGACAGCCAGGTACCTA GAGTTC CCCGCGCCAGCGGGGATAAACCGCAAACACAGCATGGACGACAGCCAG GTACCTA GAGTTCCCCGCGCCAGCGGGGATAAACCGCAAACACAGCAT GGACGACAGCCAGGTACCTA GAGTTCCCCGCGCCAGCGGGGATAAACCGAAAACAAAAGGCTCAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGggtacc >CRISPR array red2 (italics: spacers, bold:repeats) [SEQ ID NO: 38]ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAATAAGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCTTTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGG GGATAAACCGTGTGATCACTTGGGTGGTGGCTGTGTTTGCGT GAGTTC CCCGCGCCAGCGGGGATAAACCGTGTGATCACTTGGGTGGTGGCTGTG TTTGCGT GAGTTCCCCGCGCCAGCGGGGATAAACCGTGTGATCACTTG GGTGGTGGCTGTGTTTGCGT GAGTTCCCCGCGCCAGCGGGGATAAACCGAAAACAAAAGGCTCAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGggtaccDelivery of Cascade^(KKR/ELD) into the Nucleus of Human Cells

Cascade is very stable as a multi-subunit protein-RNA complex and iseasily produced in mg quantities in E. coli. Transfection ormicro-injection of the complex in its intact form as purified from E.coli is used as methods of delivery (see FIG. 12). As shown in FIG. 12,Cascade-FokI nucleases are purified from E. coli and encapsulated inprotein transfection vesicles. These are then fused with the cellmembrane of human HepG2 cells releasing the nucleases in the cytoplasm(step 2). NLS sequences are then be recognized by importin proteins,which facilitate nucleopore passage (step 3). Cascade^(KKR) (openrectangle) and Cascade^(ELD) (filled rectangle) will then find andcleave their target site (step 4.), inducing DNA repair pathways thatwill alter the target site leading to desired changes. Cascade^(KKR/ELD)nucleases need to act only once and require no permanent presence in thecell encoded on DNA.

To deliver Cascade into human cells, protein transfection reagents areused from various sources including Pierce, NEB, Fermentas and Clontech.These reagents have recently been developed for the delivery ofantibodies, and are useful to transfect a broad range of human celllines with efficiencies up to 90%. Human HepG2 cells are transfected.Also, other cell lines including CHO-K1, COS-7, HeLa, and non-embryonicstem cells, are transfected.

To import the Cascade^(KKR/ELD) nuclease pair into the nucleus, a tandemmonopartite nuclear localisation signal (NLS) from the large T-antigenof simian virus 40 (SV40) is fused to the N-terminus of FokI. Thisensures import of only intact Cascade^(ELD/KKR) into the nucleus. (Thenuclear pore complex translocates RNA polymerases (550 kDa) and otherlarge protein complexes). As a check prior to transformations, thenuclease activity of the Cascade^(KKR/ELD) nuclease pair is checked invitro using purified complexes and CCR5 PCR amplicons to excludetransfecting non-productive Cascade^(KKR/ELD) nuclease pairs.

Surveyor Assay

Transfected cells are cultivated and passaged for several days. Theefficiency of in vivo target DNA cleavage is then assessed by using theSurveyor assay of Guschin, D. Y., et at (2010) Methods Mol. Biol., 649:247-256. Briefly, PCR amplicons of the target DNA locus will be mixed1:1 with PCR amplicons from untreated cells. These are heated andallowed to anneal, giving rise to mismatches at target sites that havebeen erroneously repaired by NHEJ. A mismatch nuclease is then used tocleave only mismatched DNA molecules, giving a maximum of 50% ofcleavage when target DNA cleavage by Cascade^(KKR/ELD) is complete. Thisprocedure was then followed up by sequencing of the target DNA ampliconsof treated cells. The assay allows for rapid assessment and optimizationof the delivery procedure.

Production of Cascade-Nuclease Pairs

The Cascade-nuclease complexes were constructed as explained above.Affinity purification from E. coli using the StrepII-tagged Cse2 subunityields a complex with the expected stoichiometry when compared to nativeCascade. Referring to FIG. 13, this shows the stoichiometry of nativeCascade (1), Cascade^(KKR) with P7 CrRNA and Cascade^(ELD) with M13CrRNA 24 h after purification using only Streptactin. Bands in nativeCascade (1) are from top to bottom: Cse1, Cas7, Cas5, Cas6e, Cse2.Cascade^(KKR/ELD) show the FokI-Cse1 fusion band and an additional bandrepresenting Cse1 with a small part of FokI as a result of proteolyticdegradation.

Apart from an intact FokI-Cse1 fusion protein, we observed that afraction of the FokI-Cse1-fusion protein is proteolytically cleaved,resulting in a Cse1 protein with only the linker and a small part ofFokI attached to it (as confirmed by Mass Spectrometry, data not shown).In most protein isolations the fraction of degraded fusion protein isapproximately 40%. The isolated protein is stably stored in the elutionbuffer (20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 4 mM desthiobiotin)with additional 0.1% Tween 20 and 50% glycerol at −20° C. Under thesestorage conditions, integrity and activity of the complex have beenfound stable for at least three weeks (data not shown).

Introduction of a his₆-Tag (SEQ ID NO: 48) and NLS to theCascade-Nuclease

The Cascade nuclease fusion design was modified to incorporate aNucleolar Localization Signal (NLS) to enable transport into the nucleusof eukaryotic cells. For this a tandem monopartite NLS from the largeT-antigen of Simian Virus SV40 (sequence: PKKKRKVDPKKKRKV) (SEQ ID NO:49) was translationally fused to the N-terminus of the FokI-Cse1 fusionprotein, directly preceded by a His₆-tag at the N-terminus. The His₆-tag(sequence: MHHHHHH) (“His₆” disclosed as SEQ ID NO: 48 and “MHHHHHH”disclosed as SEQ ID NO: 50) allows for an additional Ni²⁺-resin affinitypurification step after StrepII purification. This additional stepensures the isolation of only full-length Cascade-nuclease fusioncomplex, and increases the efficiency of cleavage by eliminating thebinding of non-intact Cascade complexes to the target site forming anunproductive nuclease pair.

In Vitro Cleavage Assay

Cascade^(KRR/ELD) activity and specificity was assayed in vitro asdescribed above. FIG. 14A shows plasmids with distances betweenprotospacers of 25-50 bp (5 bp increments, lanes 1-6) incubated withCascade^(KKR/ELD) for 30 minutes at 37° C. Lane 10 contains the targetplasmid in its three possible topologies: the lowest band represents theinitial, negatively supercoiled (nSC) form of the plasmid, the middleband represents the linearized form (cleaved by XbaI), whilst the upperband represents the open circular (OC) form (after nicking withNt.BbrCI). Lane 7 shows incubation of a plasmid with both binding sitesremoved (negative control). Therefore FIG. 14A shows a typical cleavageassay using various target plasmids in which the binding sites areseparated by 25 to 50 base pairs in 5 bp increments (lanes 1 to 6).These plasmids with distances of 25-50 bp were incubated withCascade^(KKR/ELD) carrying anti P7 and M13 crRNA respectively. A plasmidcontaining no binding sites served as a control (lane 7). The originalplasmid exists in negatively supercoiled form (nSC, control lane 8), andnicked or linearized products are clearly distinguishable. Uponincubation a linear cleavage product is formed when binding sites wereseparated by 30, 35 and 40 base pairs (lanes 2, 3, 4). At 25, 45 and 50base pairs distance (lanes 1, 5, 6), the target plasmid appeared to beincompletely cleaved leading to the nicked form (OC). These results showthe best cleavage in plasmids with distances between 30 and 40 bp,giving sufficient flexibility when designing a crRNA pair for any givenlocus. Both shorter and longer distances result in increased nickingactivity while creating less DSBs. There is very little activity on aplasmid where the two protospacers have been removed, showing targetspecificity (lane 7).

Cleavage Conditions

To assess the optimal buffer conditions for cleavage assays, and toestimate whether activity of the complex is expected at physiologicalconditions, the following two buffers were selected: (1) NEB4 (NewEngland Biolabs, 50 mM potassium acetate, 20 mM Tris-acetate, 10 mMmagnesium acetate, 1 mM dithiothreitol, pH 7.9) and (2) Buffer 0(Fermentas, 50 mM Tris-HCl, 10 mM MgCl₂, 100 mM NaCl, 0.1 mg/mL BSA, pH7.5). Of the two, NEB4 is recommended for optimal activity of thecommercial intact FokI enzyme. Buffer 0 was chosen from a quick screento give good activity and specificity (data not shown). FIG. 14B showsincubation with different buffers and different incubation times. Lanes1-4 have been incubated with Fermentas Buffer 0 (lane 1, 2 for 15minutes, lane 3, 4 for 30 minutes), lanes 5, 6 have been incubated withNEB4 (30 minutes). Lanes 1, 3, 5 used the target plasmid with 35 bpspacing, lanes 2, 4, 6 used the non-target plasmid (no binding sites).Lanes 7, 8 have been incubated with only Cascade^(KKR) or Cascade^(ELD)respectively (buffer 0). Lane 9 is the topology marker as in (A). Lane10 and 11 show the target and non-target plasmid incubated withoutaddition of Cascade. Therefore in FIG. 14B, activity was tested on thetarget plasmid with 35 base pairs distance (lane 1, 3, 5) and anon-target control plasmid (lane 2, 4, 6). There was a high amount ofunspecific nicking and less cleavage in NEB4 (lane 5,6), whilst buffer 0shows only activity in the target plasmid with a high amount of specificcleavage and little nicking (lane 1-4). The difference is likely causedby the NaCl concentration in buffer 0, higher ionic strength weakensprotein-protein interactions, leading to less nonspecific activity.Incubation of 15 or 30 minutes shows little difference in both targetand non-target plasmid (lane 1,2 or 3,4 respectively). Addition of onlyone type of Cascade (P7^(KKR) or M13^(ELD)) does not result in cleavageactivity (lane 7, 8) as expected. This experiment shows that specificCascade nuclease activity by a designed pair occurs when the NaClconcentration is at least 100 mM, which is near the physiological salineconcentration inside cells (137 mM NaCl). The Cascade nuclease pair isexpected to be fully active in vivo, in eukaryotic cells, whiledisplaying negligible off-target cleavage activity.

Cleavage Site

The site of cleavage in the target plasmid with a spacing of 35 bp(pTarget35) was determined. FIG. 15 shows how sequencing reveals up- anddownstream cleavage sites by Cascade^(KKR/ELD) in the target plasmidwith 35 base pair spacing. In FIG. 15A) is shown the target regionwithin pTarget35 with annotated potential cleavage sites. Parts of theprotospacers are indicated in red and blue. B) The bar chart shows fourdifferent cleavage patterns and their relative abundance withinsequenced clones. The blue bars represent the generated overhang, whilethe left and right border of each bar represents the left and rightcleavage site (see B for annotation).

FIG. 15A shows the original sequence of pTarget35, with numberedcleavage sites from −7 to +7 where 0 lies in the middle between the twoprotospacers (indicated in red and blue). Seventeen clones weresequenced and these all show cleavage around position 0, creatingvarying overhangs between 3 and 5 bp (see FIG. 15B). Overhangs of 4 aremost abundant (cumulatively 88%), while overhangs of 3 and 5 occur onlyonce (6% each). The cleavage occurred exactly as expected with no clonesshowing off target cleavage.

Cleaving a Target Locus in Human Cells.

Cascade^(KKR/ELD) nucleases were successfully modified to contain anN-terminal His₆-tag (SEQ ID NO: 48) followed by a dual mono-partiteNucleolar Localisation Signal. These modified Cascade nuclease fusionproteins were co-expressed with either one of two syntheticallyconstructed CRISPR arrays, each targeting a binding site in the humanCCR5 gene. First the activity of this new nuclease pair is validated invitro by testing the activity on a plasmid containing this region of theCCR5 gene. The nuclease pair is transfected to a human cell line, e.g.HeLa cell line. Efficiency of target cleavage is assessed using theSurveyor assay as described above.

1-46. (canceled)
 47. A composition comprising a designed fusion of (i)at least one clustered regularly interspaced short palindromic repeat(CRISPR)-associated protein subunit, and (ii) a nuclease, or a mutant oran active portion thereof.
 48. The composition of claim 47, wherein thenuclease is a ribonuclease, or a mutant or an active portion thereof.49. The composition of claim 48, wherein the ribonuclease is anendonuclease, a 3′ exonuclease or a 5′ exonuclease.
 50. The compositionof claim 47, wherein the CRISPR-associated protein subunit is selectedfrom the group consisting of Cas6, Cas5, Cse1, Cse2 and Cas7.
 51. Thecomposition of claim 50, wherein the CRISPR-associated protein subunitis Cse1.
 52. The composition of claim 47, wherein the nuclease is anendonuclease, or mutant or an active portion thereof.
 53. Thecomposition of claim 52, wherein the endonuclease comprises FokI or amodified FokI.
 54. The composition of claim 53, wherein the modifiedFokI is KKR Sharkey or ELD Sharkey or a combination thereof.
 55. Thecomposition of claim 47, wherein the nuclease is a FokI KKR Sharkey,further comprising a second composition comprising a second designedfusion of (i) at least one clustered regularly interspaced shortpalindromic repeat (CRISPR)-associated protein subunit, and (ii) anuclease, or a mutant or an active portion thereof, wherein the nucleaseis a FokI ELD Sharkey.
 56. The composition of claim 47, furthercomprising a second composition comprising a second designed fusion of(i) at least one clustered regularly interspaced short palindromicrepeat (CRISPR)-associated protein subunit, and (ii) a nuclease, or amutant or an active portion thereof.
 57. The composition of claim 50,wherein the nuclease comprises FokI or a modified FokI.
 58. Thecomposition of claim 57, wherein the modified FokI is KKR Sharkey or ELDSharkey or a combination thereof.
 59. The composition of claim 51,wherein the nuclease comprises FokI or a modified FokI.
 60. Thecomposition of claim 59, wherein the modified FokI is KKR Sharkey or ELDSharkey or a combination thereof.
 61. The composition of claim 47,wherein the designed fusion further comprises a linker polypeptidebetween (i) the clustered regularly interspaced short palindromic repeat(CRISPR)-associated protein subunit, and (ii) the nuclease, or themutant or the active portion thereof.
 62. The composition of claim 47,wherein the clustered regularly interspaced short palindromic repeat(CRISPR)-associated protein subunit and (ii) the nuclease, or the mutantor the active portion thereof, are covalently linked by chemicalsynthesis.
 63. One or more nucleic acid molecules encoding thecomposition of claim
 47. 64. One or more expression vectors comprisingthe nucleic acid molecules of claim 63.